
Software Engineer, Inference - Performance Optimization
OpenAI, Los Angeles, CA, United States
Software Engineer, Inference - Performance Optimization
Inference – San Francisco
About the Team
Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and drive faster, cheaper inference. We combine systems profiling, benchmarking, and analysis to understand where time and cost are spent, then turn that understanding into performance optimizations and models that project performance and capacity needs for future launches.
About the Role
In this role, you will model inference performance across application, model, and fleet layers with higher fidelity. You will build cost-to-serve estimates from microbenchmarks and create tools that help cross‑functional teams reason about latency, capacity, utilization, and cost tradeoffs.
Responsibilities
Build and refine performance models that translate microbenchmark results into cost-to-serve estimates.
Analyze inference workloads end to end across applications, models, and fleet infrastructure.
Enhance tooling to identify bottlenecks across layers for latency and throughput.
Partner with other teams to turn performance insights into concrete improvements and project how future changes affect inference.
Qualifications
Enjoy reasoning from first principles about distributed systems, model inference, and hardware efficiency.
Are comfortable working across abstraction layers, from application behavior to kernels, accelerators, networking, and fleet scheduling.
Have deep expertise with performance profiling, benchmarking, analysis, and optimization.
Enjoy collaborating with engineering and research teams to improve real production systems.
Compensation
$295K – $555K + Offers Equity
OpenAI is an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.
#J-18808-Ljbffr
Inference – San Francisco
About the Team
Our team analyzes inference stack performance across the application, model, and fleet layers to identify bottlenecks and drive faster, cheaper inference. We combine systems profiling, benchmarking, and analysis to understand where time and cost are spent, then turn that understanding into performance optimizations and models that project performance and capacity needs for future launches.
About the Role
In this role, you will model inference performance across application, model, and fleet layers with higher fidelity. You will build cost-to-serve estimates from microbenchmarks and create tools that help cross‑functional teams reason about latency, capacity, utilization, and cost tradeoffs.
Responsibilities
Build and refine performance models that translate microbenchmark results into cost-to-serve estimates.
Analyze inference workloads end to end across applications, models, and fleet infrastructure.
Enhance tooling to identify bottlenecks across layers for latency and throughput.
Partner with other teams to turn performance insights into concrete improvements and project how future changes affect inference.
Qualifications
Enjoy reasoning from first principles about distributed systems, model inference, and hardware efficiency.
Are comfortable working across abstraction layers, from application behavior to kernels, accelerators, networking, and fleet scheduling.
Have deep expertise with performance profiling, benchmarking, analysis, and optimization.
Enjoy collaborating with engineering and research teams to improve real production systems.
Compensation
$295K – $555K + Offers Equity
OpenAI is an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.
#J-18808-Ljbffr