Part-Time Research Data Scientist

Califesciences, South San Francisco, CA, United States

We’re hiring a part-time Research Data Scientist to lead end-to-end preparation of complex, large-scale health datasets for peer-reviewed publication. This role centers on cleaning, harmonizing, and structuring messy, multi-source datasets, followed by advanced statistical analysis and machine learning to generate publishable insights.

You’ll work with survey, observational, and real-world health data, building reproducible analytical workflows that meet academic research standards. This role is best suited for a PhD‑trained data scientist or quantitative researcher with deep experience in machine learning, advanced statistics, and real-world data analysis.

Key Responsibilities

Data Cleaning & Harmonization

Clean, normalize, and integrate messy datasets from multiple sources (e.g., survey data from longitudinal studies)

Resolve inconsistencies and schema mismatches across datasets

Design scalable approaches to dataset harmonization for cross‑study comparability

Data Pipeline Development

Build and maintain reproducible data processing workflows for large‑scale datasets

Structure datasets for downstream statistical modeling and publication‑ready outputs

Implement version‑controlled workflows for data processing and analysis

Statistical Analysis & Machine Learning

Apply advanced statistical methods (e.g., mixed‑effects models, causal inference, longitudinal modeling)

Develop, validate, and interpret machine learning models for large‑scale observational data as needed

Ensure methodological rigor aligned with peer‑reviewed research standards

Research Collaboration

Partner with researchers to refine hypotheses, define analytic strategies, and interpret findings

Translate complex analyses into clear, defensible results for academic publication

Reproducibility & Publication Support

Develop reproducible codebases and documentation (e.g. notebooks, pipelines)

Prepare datasets, figures, and statistical outputs for manuscripts, abstracts, and reports

Contribute to methodological transparency and auditability of analyses

Technical publication‑ready writing ability required—writing up Results and Methods sections for publication

Qualifications

PhD (preferred) in Data Science, Statistics, Biostatistics, Epidemiology, Computer Science, Experimental Psychology or a related quantitative field

3–5+ years experience working with large, complex datasets in research, healthcare, or applied data science

Strong expertise in data cleaning, preprocessing, and dataset harmonization at scale

Advanced proficiency in Python or R (e.g., pandas, tidyverse, scikit‑learn, statsmodels) or related software/programming experience

Deep experience with machine learning and advanced statistical methods

Strong foundation in reproducible research practices

Ability to communicate technical findings clearly to interdisciplinary teams and collaborate with team members to produce high quality publications

Preferred Qualifications

Prior experience preparing analyses for peer‑reviewed publication

Familiarity with survey data (Qualtrics, REDCap) and/or healthcare data standards (FHIR)

Background in public health, epidemiology, or biostatistics

Experience with causal inference, longitudinal analysis, or real‑world evidence studies

Experience working with messy, real‑world observational datasets across multiple sources

Familiarity with cloud or distributed data tools (AWS, GCP, Spark)

Background or familiarity in cannabinoid research

#J-18808-Ljbffr