
Python Insfrastructure Engineer - Model Evaluation
Alignerr, Seattle, WA, United States
Python Infrastructure Engineer — Model Evaluation (AI Training)
What if your Python expertise could directly shape how the world’s most advanced AI models are built, tested, and improved? We’re looking for a Senior Python Infrastructure Engineer to design and build the data pipelines, annotation tooling, and evaluation systems that leading AI labs depend on to train and validate next-generation models.
This is a fully remote contract role with flexible hours — you’ll be working on real production systems at the cutting edge of AI development.
Organization: Alignerr
Type: Hourly Contract
Location: Remote
Commitment: 20–40 hours/week
What You’ll Do
Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
Build and maintain evaluation harnesses for ML models, integrating with inference frameworks
Improve reliability, performance, and safety across existing Python codebases
Implement observability, metrics collection, and monitoring to track system reliability and model performance
Identify bottlenecks and edge cases in data and system behavior, and ship scalable fixes
Collaborate with data, research, and engineering teams to support model training and evaluation workflows
Participate in synchronous design reviews to iterate on system architecture and implementation decisions
Who You Are
Native or fluent English speaker with clear written and verbal communication skills
Full-stack developer with a strong systems programming background
3–5+ years of professional experience writing production-grade Python
Experienced building evaluation harnesses for ML models and integrating with inference frameworks
Strong background in observability, metrics collection, and system reliability monitoring
Able to commit 20–40 hours per week consistently
Self-directed and comfortable working asynchronously across distributed teams
Nice to Have
Prior experience with data annotation, data quality, or evaluation systems
Familiarity with AI/ML workflows, model training, or benchmarking pipelines
Experience with distributed systems or developer tooling
Background in MLOps, infrastructure engineering, or platform engineering
Why Join Us
Work on real production systems powering some of the most advanced AI research in the world
Fully remote and flexible — structure your work around your life
Freelance autonomy with the depth and meaning of high-impact engineering work
Contribute directly to AI infrastructure that shapes how next-generation models are built and evaluated
Potential for ongoing work and contract extension as new projects launch
#J-18808-Ljbffr
What if your Python expertise could directly shape how the world’s most advanced AI models are built, tested, and improved? We’re looking for a Senior Python Infrastructure Engineer to design and build the data pipelines, annotation tooling, and evaluation systems that leading AI labs depend on to train and validate next-generation models.
This is a fully remote contract role with flexible hours — you’ll be working on real production systems at the cutting edge of AI development.
Organization: Alignerr
Type: Hourly Contract
Location: Remote
Commitment: 20–40 hours/week
What You’ll Do
Design, build, and optimize high-performance Python systems supporting AI data pipelines and model evaluation workflows
Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
Build and maintain evaluation harnesses for ML models, integrating with inference frameworks
Improve reliability, performance, and safety across existing Python codebases
Implement observability, metrics collection, and monitoring to track system reliability and model performance
Identify bottlenecks and edge cases in data and system behavior, and ship scalable fixes
Collaborate with data, research, and engineering teams to support model training and evaluation workflows
Participate in synchronous design reviews to iterate on system architecture and implementation decisions
Who You Are
Native or fluent English speaker with clear written and verbal communication skills
Full-stack developer with a strong systems programming background
3–5+ years of professional experience writing production-grade Python
Experienced building evaluation harnesses for ML models and integrating with inference frameworks
Strong background in observability, metrics collection, and system reliability monitoring
Able to commit 20–40 hours per week consistently
Self-directed and comfortable working asynchronously across distributed teams
Nice to Have
Prior experience with data annotation, data quality, or evaluation systems
Familiarity with AI/ML workflows, model training, or benchmarking pipelines
Experience with distributed systems or developer tooling
Background in MLOps, infrastructure engineering, or platform engineering
Why Join Us
Work on real production systems powering some of the most advanced AI research in the world
Fully remote and flexible — structure your work around your life
Freelance autonomy with the depth and meaning of high-impact engineering work
Contribute directly to AI infrastructure that shapes how next-generation models are built and evaluated
Potential for ongoing work and contract extension as new projects launch
#J-18808-Ljbffr