AI Eval Engineer: Design & Publish AI Benchmarks

Braintrust Data, Inc., San Francisco, CA, United States

A leading AI observability platform is seeking an Eval Engineer to design and run evaluations of AI capabilities. This role focuses on turning AI ideas into measurable experiments, leveraging technical skills to analyze models and publish results for the developer ecosystem. You will compare models, define datasets, and ensure experiments are reproducible. Ideal candidates will have experience in evaluation systems and a passion for AI methodology. Competitive salary and flexible benefits offered.
#J-18808-Ljbffr