Senior SRE Manager

WEX Inc., San Francisco, CA, United States

Senior SRE Manager page is loaded## Senior SRE Managerlocations:

Portland, ME:

San Francisco, CA:

Chicago, IL:

Dallas, TXtime type:

Full timeposted on:

Posted Yesterdayjob requisition id:

R21373****About the Team/Role****We are looking for a highly motivated and high-potential Senior Manager Site Reliability Engineering (SRE) to join our team as a technical leader and drive transformative impact across WEX’s platform reliability and operational excellence.

This is a particularly exciting time to be part of the SRE function at WEX. Our diverse product ecosystem supports a wide array of customer businesses and generates rich, complex telemetry across applications, infrastructure, and platforms. Ensuring these systems are scalable, observable, and resilient is critical to unlocking business value and customer success.

As a Sr Manager SRE, you will play a pivotal role in shaping the reliability engineering strategy at WEX. You’ll architect and lead efforts that improve availability, performance, and efficiency at scale—driving initiatives across observability, automation, incident management, problem management, capacity planning, and performance optimization. You’ll be hands-on in building foundational tooling and frameworks while also acting as a multiplier—mentoring engineers, aligning cross-functional teams, and influencing platform decisions with a strong reliability lens.

You’ll work closely with engineering, product, and platform teams to instill SRE best practices and enable a shift toward proactive, scalable operations. Our team embraces agile development, a strong product mindset, and modern engineering practices, including AI-assisted operations and intelligent automation.

You’ll take on some of the most complex, high-impact challenges at WEX—supported by a team of highly skilled engineers and technical leaders invested in your success and growth.

If you’re a senior technical leader passionate about building reliable systems, leading through influence, and making a meaningful impact, this is a fantastic opportunity for you.****How you’ll make an impact***** Architect and oversee the implementation of mission-critical systems.* Define and enforce SRE best practices and operational standards.* Lead cross-functional initiatives to enhance system reliability and performance.* Serve as a technical advisor for engineering leadership.* Develop capacity planning and load testing strategies.* Design self-healing and auto-recovery mechanisms.* Drive cloud cost optimization and budgeting initiatives.* Lead one or more SRE teams responsible for a major platform or domain.* Partner with Engineering, Product, and Program stakeholders to align team delivery with business priorities.****Experience you’ll bring***** 8+ years of experience with a focus on large-scale system reliability.* Expertise in system architecture, cloud platforms, and automation frameworks.* Deep knowledge of Kubernetes, service meshes, and distributed tracing.* Experience with monitoring and logging (Grafana, ELK stack, Splunk, etc.).* Knowledge of containerization and orchestration (Docker, Kubernetes).* Experience designing high-availability, fault-tolerant architectures.* Strong understanding of database reliability engineering (MySQL, PostgreSQL, NoSQL). Knowledge of networking, databases, and storage architectures.* Excellent incident command and crisis management skills.* Experience setting team OKRs and aligning reliability goals with product and platform engineering strategies.****Preferred Qualification***** Experience with multi-region and multi-cloud deployments.* Deep expertise in scalable microservices and event-driven architectures.* Strong experience with advanced observability tools (OpenTelemetry, Jaeger, Prometheus).* Leadership in driving large-scale SRE transformations.* Experience with designing and developing AI based solutions.* Ability to influence engineering culture and process improvements.* Experience in healthcare, insurance, or benefits technology.* Understanding of Benefits domain such as claims processing, eligibility lookup success rate.* Experience working with compliance frameworks such as HIPAA, SOC 2, or HITRUST.* Proven success building and scaling high-performing SRE teams in production environments.* Ability to develop team-wide practices around incident management, postmortems, alert hygiene, and reliability KPIs.* Skilled at coaching engineers through complex reliability challenges and career inflection points.Pay Range: $175,600.00 - $204,300.00WEX is a global commerce platform that helps businesses solve for operational complexities like employee benefits, managing and mobilizing fleets, and streamlining payments.With over 6,500 employees, we work with large and small companies in more than 200 countries and territories, and can tailor our services to meet the unique needs of their businesses.We hire people who share our passion for continuous innovation and client service that is unparalleled in the industry. Offering comprehensive and market competitive benefits, our offerings are designed to support your personal and professional well-being. If you’re looking for a growing career - come be part of WEX today. To learn more about our employee benefits, please .WEX is an equal opportunity employer committed to diversity and inclusion in the workplace. All qualified applicants will receive consideration for employment without regard to sex, race, color, age, national origin, religion, sexual orientation, gender identity, protected veteran status, disability or other protected status. WEX promotes a drug-free workplace.Qualified individuals with a disability have the right to request a reasonable accommodation. If you require a reasonable accommodation as a result of your disability at any point in the job application process, please submit your request through our .This form is for accommodation requests only and cannot be used to inquire about the status of applications. #J-18808-Ljbffr