Mediabistro logo
job logo

Remote | Mathematics Model Prompt Evaluator — $25–$60/hour

24-MAG, San Francisco, CA, United States


We are sharing a specialised part‑time consulting opportunity for expert mathematicians with strong backgrounds in mathematical reasoning, proof writing, formal analysis, and high‑quality technical question design. This role supports an exciting collaboration with a leading frontier AI research laboratory focused on improving mathematical reasoning and model evaluation through rigorous, high‑quality prompt authoring and verification workflows. Selected professionals will author and verify open‑ended mathematical problems across core subdomains such as probability, statistics, algebra, differential equations, geometry, graph theory, and number theory. The goal is to help advanced AI systems produce higher‑quality reasoning in complex mathematical contexts by building challenging, unambiguous evaluation tasks and applying expert judgment to assess prompt quality, scope, and difficulty.

Key Responsibilities Prompt Authoring

Create original, open‑ended prompts within an assigned mathematical subdomain across varying difficulty levels, including undergraduate, advanced undergraduate, and graduate or professional levels.

Design prompts that require human judgment to evaluate the quality of the AI’s response, including tasks involving proof construction, formal reasoning, or multi‑step mathematical analysis.

Ensure prompts are clear, well‑scoped, and sufficiently challenging for meaningful model evaluation.

Prompt Verification & Quality Review

Review authored prompts for clarity, uniqueness, scope alignment, and difficulty accuracy.

Edit prompts and difficulty assignments where standards are not met.

Ensure that prompts within each task are sufficiently distinct from one another and aligned with project expectations.

Mathematical Reasoning Evaluation Support

Apply expert judgment to assess the depth and quality of mathematical reasoning required by each prompt.

Help establish rigorous evaluation standards for frontier language models operating in mathematical domains.

Support high‑quality task design across a broad set of mathematical subfields.

Ideal Profile

A Master’s degree or higher in Mathematics, Applied Mathematics, Statistics, or a closely related field.

2–6 years of professional or research experience in a quantitative field.

Strong command of graduate‑level mathematical concepts including proof writing, analysis, and formal reasoning.

Excellent written English and the ability to craft precise, well‑scoped technical questions.

Comfort working across structured evaluation tasks requiring depth, clarity, and mathematical judgment.

Preferred Qualifications

Experience in academic research, mathematical competition design, or quantitative industry roles.

Experience across one or more of the following areas: probability and statistics, algebra including linear algebra, ordinary or partial differential equations and dynamical systems, geometry, graph theory, or number theory.

Ability to design open‑ended mathematical questions that require nuanced reasoning rather than simple factual recall.

Strong editorial judgment when reviewing scope, clarity, and difficulty calibration.

Benefits

Contribute specialised mathematics expertise to a cutting‑edge AI collaboration.

Help improve how advanced AI systems reason through complex mathematical problems and formal analytical tasks.

Work on high‑impact evaluation workflows that shape mathematical model benchmarking standards.

Flexible remote work with structured expectations and competitive hourly compensation.

Contract Details

Independent contractor role.

Fully remote with flexible scheduling.

Hourly compensation of $25–$60 per hour.

Expected commitment of 10+ hours per week.

Work is fully asynchronous.

Projects may be extended, shortened, or concluded early depending on project needs and performance.

Weekly payments via Stripe or Wise.

Work will not involve access to confidential or proprietary information from any employer, client, or institution.

Please note: We are unable to support H1‑B or STEM OPT candidates at this time.

Start date: Immediate.

#J-18808-Ljbffr