Mediabistro logo
job logo

Remote | Operations Research Model Prompt Evaluator — $60–$80/hour

24-MAG, San Francisco, CA, United States


Overview We are sharing a specialised part‑time consulting opportunity for experienced operations research professionals with strong quantitative judgment, deep technical knowledge, and the ability to craft and verify high‑quality open‑ended prompts for AI model evaluation. This role supports an exciting collaboration with leading AI companies focused on improving frontier language models through high‑quality prompt authoring, verification, and evaluation workflows across core operations research and decision‑science domains.

Key Responsibilities Professionals in this role may contribute to:

Prompt Authoring for AI Evaluation

Create original, open‑ended operations research prompts from assigned subdomains at varying difficulty levels

Develop prompts that require human judgment to evaluate the quality of AI responses

Help ensure that prompts are clear, technically rigorous, and suitable for model evaluation

Prompt Verification & Quality Review

Review authored prompts for clarity, scope alignment, difficulty accuracy, and uniqueness

Edit prompts and difficulty ratings where needed

Help maintain high standards for precision, quality, and consistency across evaluation tasks

Operations Research Reasoning Assessment

Apply expert judgment to assess the depth and quality of quantitative reasoning required

Work across areas such as optimization modeling, algorithmic analysis, stochastic reasoning, and decision science

Help improve model quality through carefully designed and verified technical prompts

Ideal Profile

Strong candidates may have a Master's degree or higher in Operations Research, Industrial Engineering, Applied Mathematics, or a closely related field

2–6 years of professional or research experience in optimization, logistics, or decision science

Strong command of mathematical programming, probabilistic modeling, and algorithmic methods

Excellent written English and the ability to craft precise, well‑scoped technical questions

Preferred Qualifications

Experience with solvers such as Gurobi or CPLEX

Experience with simulation tools

Strong familiarity with operations research subdomains such as linear and integer programming, network optimization, queuing theory, game theory, supply chain optimization, and simulation

High attention to detail and strong consistency in technical evaluation workflows

Why This Opportunity

Contribute specialised operations research expertise to a cutting‑edge AI collaboration

Help establish rigorous evaluation standards for frontier language models

Work on high‑impact prompt design and verification tasks with strong technical relevance

Flexible remote work with competitive hourly compensation

Contract Details

Independent contractor role

Fully remote with flexible scheduling

Hourly compensation of $60–$80 per hour

Expected commitment of 10+ hours per week

Asynchronous work format

Assignments may involve either authoring or verification tasks depending on project needs

Projects may be extended, shortened, or concluded early depending on project needs and performance

Weekly payments via Stripe or Wise

Work will not involve access to confidential or proprietary information from any employer, client, or institution

Please note: We are unable to support H1‑B or STEM OPT candidates at this time

Start date: Immediate

#J-18808-Ljbffr