
Senior Performance Engineer - AI Platforms (PSAP Team)
Red Hat, Raleigh, NC, United States
About the Job
The Red Hat Performance and Scale Engineering team seeks a Senior Performance Engineer for the PSAP team. The role focuses on driving performance and scalability of distributed inference for Large Language Models (LLMs) within the llm-d open source project. You will analyze, model, and optimize distributed LLM systems to deliver industry‑leading throughput, latency, and cost efficiency across Red Hat’s AI platforms.
What You Will Do
Define and track key performance indicators (KPIs) and service level objectives (SLOs) for large‑scale, distributed LLM inference services in Kubernetes/OpenShift.
Participate in the performance roadmap for distributed inference, including multi‑node and multi‑GPU scaling studies, interconnect performance analysis, and competitive benchmarking.
Formulate performance test plans, execute benchmarks, and analyze results to drive improvements and detect performance issues.
Develop and maintain tools, scripts, and automated solutions for performance benchmarking.
Collaborate with cross‑functional engineering teams to identify and address performance issues.
Partner with DevOps to embed performance gates into GitHub Actions/OpenShift Pipelines.
Explore emerging AI technologies and identify opportunities to incorporate new AI capabilities into workflows and tooling.
Triage field and customer escalations related to performance; document findings for upstream issues and backlog.
Publish results, recommendations, and best practices through internal reports, presentations, external blogs, and official documentation.
Represent the team at internal and external conferences.
What You Will Have
5+ years of software engineering experience, with at least 3 years focused on performance engineering or systems‑level development.
Strong understanding of operating systems and distributed systems.
Foundational knowledge of AI and LLM inference workflows.
Proficiency in Python, Linux, and Bash for data and machine‑learning workflows.
Excellent communication skills and ability to translate performance data into clear business value.
Commitment to open source principles.
The Following Is Considered a Plus
Master’s or PhD in Computer Science, AI, or related field.
Experience contributing to open source projects or leading community initiatives.
Hands‑on experience with Kubernetes or OpenShift.
Familiarity with performance and observability tools such as perf, eBPF, Nsight Systems, PyTorch Profiler.
Experience with modern LLM inference stacks such as vLLM, TensorRT LLM, Hugging Face TGI, and Triton Inference Server.
Pay Transparency
Red Hat determines compensation based on several factors including job location, experience, applicable skills and training, external market value, and internal pay equity. The annual salary is one component of Red Hat’s compensation package. The position may be eligible for bonus, commission, and/or equity. For Remote‑US locations the salary range may differ based on location.
Salary
The salary range for this position is $136,320.00 – $225,090.00. Actual offer will be based on qualifications.
Benefits
Comprehensive medical, dental, and vision coverage.
Flexible Spending Account for healthcare and dependent care.
Health Savings Account.
Retirement 401(k) with employer match.
Paid time off and holidays.
Paid parental leave plans.
Leave benefits including disability, paid family medical leave, and paid military leave.
Additional benefits: employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more.
Equal Opportunity Policy (EEO)
Red Hat is an equal‑opportunity workplace and affirmative action employer. We review applications without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, disability, medical condition, marital status, or any other basis prohibited by law.
#J-18808-Ljbffr
The Red Hat Performance and Scale Engineering team seeks a Senior Performance Engineer for the PSAP team. The role focuses on driving performance and scalability of distributed inference for Large Language Models (LLMs) within the llm-d open source project. You will analyze, model, and optimize distributed LLM systems to deliver industry‑leading throughput, latency, and cost efficiency across Red Hat’s AI platforms.
What You Will Do
Define and track key performance indicators (KPIs) and service level objectives (SLOs) for large‑scale, distributed LLM inference services in Kubernetes/OpenShift.
Participate in the performance roadmap for distributed inference, including multi‑node and multi‑GPU scaling studies, interconnect performance analysis, and competitive benchmarking.
Formulate performance test plans, execute benchmarks, and analyze results to drive improvements and detect performance issues.
Develop and maintain tools, scripts, and automated solutions for performance benchmarking.
Collaborate with cross‑functional engineering teams to identify and address performance issues.
Partner with DevOps to embed performance gates into GitHub Actions/OpenShift Pipelines.
Explore emerging AI technologies and identify opportunities to incorporate new AI capabilities into workflows and tooling.
Triage field and customer escalations related to performance; document findings for upstream issues and backlog.
Publish results, recommendations, and best practices through internal reports, presentations, external blogs, and official documentation.
Represent the team at internal and external conferences.
What You Will Have
5+ years of software engineering experience, with at least 3 years focused on performance engineering or systems‑level development.
Strong understanding of operating systems and distributed systems.
Foundational knowledge of AI and LLM inference workflows.
Proficiency in Python, Linux, and Bash for data and machine‑learning workflows.
Excellent communication skills and ability to translate performance data into clear business value.
Commitment to open source principles.
The Following Is Considered a Plus
Master’s or PhD in Computer Science, AI, or related field.
Experience contributing to open source projects or leading community initiatives.
Hands‑on experience with Kubernetes or OpenShift.
Familiarity with performance and observability tools such as perf, eBPF, Nsight Systems, PyTorch Profiler.
Experience with modern LLM inference stacks such as vLLM, TensorRT LLM, Hugging Face TGI, and Triton Inference Server.
Pay Transparency
Red Hat determines compensation based on several factors including job location, experience, applicable skills and training, external market value, and internal pay equity. The annual salary is one component of Red Hat’s compensation package. The position may be eligible for bonus, commission, and/or equity. For Remote‑US locations the salary range may differ based on location.
Salary
The salary range for this position is $136,320.00 – $225,090.00. Actual offer will be based on qualifications.
Benefits
Comprehensive medical, dental, and vision coverage.
Flexible Spending Account for healthcare and dependent care.
Health Savings Account.
Retirement 401(k) with employer match.
Paid time off and holidays.
Paid parental leave plans.
Leave benefits including disability, paid family medical leave, and paid military leave.
Additional benefits: employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more.
Equal Opportunity Policy (EEO)
Red Hat is an equal‑opportunity workplace and affirmative action employer. We review applications without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, disability, medical condition, marital status, or any other basis prohibited by law.
#J-18808-Ljbffr