Data Scientist (AI Quality & Evaluation)

Bioscope.ai, Inc., Boston, MA, United States

About the Role
We're looking for a Data Scientist to own the quality, reliability, and trustworthiness of our clinical AI outputs. You'll build the systems that ensure our AI "knows what it doesn't know"—developing evaluation frameworks, calibrated confidence scoring, and automated quality assurance that physicians can actually trust.

What You'll Do

Design and implement automated evaluation pipelines that assess AI output quality, accuracy, and safety at scale

Develop uncertainty quantification systems where confidence scores meaningfully correlate with accuracy

Build comprehensive evaluation frameworks combining automated assessment with clinician-validated test cases

Implement feedback loops that continuously improve model outputs based on validation signals

Establish scalable quality gates that catch errors before they reach end users

Contribute to model alignment and fine-tuning efforts

Qualifications
Required

Strong foundation in deep learning frameworks (PyTorch) and LLM architectures

Experience with model evaluation, benchmarking, and quality metrics

Proficiency in Python and modern ML development tools

Strong statistical foundations

Ability to read, implement, and extend research papers

Excellent communication skills

Preferred

Master's degree in Computer Science, Machine Learning, Statistics, or related quantitative field (PhD preferred)

Publications in top ML/AI venues (NeurIPS, ICML, ICLR, ACL)

Experience with RLHF, DPO, or preference optimization techniques

Background in healthcare AI or regulated industries

Experience building evaluation systems for production LLM applications

#J-18808-Ljbffr