Senior Site Reliability Engineer - (Live Video Streaming)
Avesta Computer Services - Los Angeles, California, United States,Work at Avesta Computer Services
Overview
- Apply
Overview
- Senior Site Reliability Engineer - (Live Video Streaming)Location
-
Los Angeles, California , United States OR
Tempe ,
Arizona , United States -
HybridType
- Full-time Permanent
JOB DESCRIPTIONOur clients stands as a beacon of innovation, crafting world-class, large scale digital products that redefine the entertainment experience. We're on the lookout for visionary individuals to join our pioneering team, tasked with shaping the future of streaming products. Now is your chance to be part of creating and delivering extraordinary digital experiences spanning Sports and Entertainment. As a key member of our team, you'll drive innovation and significantly contribute to our mission of pioneering the next generation of streaming products. Your opportunity to create unparalleled fan experiences for these iconic sports events is here. Our current advanced digital solutions, accessed by millions across web, mobile, and living room devices, signify just the start of our ambitious journey.
ABOUT THE ROLE
Our client is hiring a Principal SRE to build and operate infrastructure and platforms to support APIs around our live direct to consumer APIs for major live events such as the Super Bowl, World Cup, and World Series. The principal engineer will be the technical lead for solving thundering herd problems including partnering with the application team to load test, scale up and scale back down again and help design the platform and infrastructure to meet their needs.A collaborative, peacemaker mindset is a must while fostering a culture of learning and continuous improvement for the entire team. The principal engineer will additionally work with the Director, Platform Engineering to visualize workflows, and refine processes and policies to keep the team throughput high.
A SNAPSHOT OF YOUR RESPONSIBILITIES
Serve as technical lead for the implementation and operation of cloud-based infrastructure and platform including EKS and other AWS services supporting direct to consumer APIs and solving associated thundering herd problems including load testing, scaling up and scaling back down againWork closely with Video & Player Engineering
and 3rd party teams to help design and implement scalability, cost visibility and observability in the platformHelp to mentor and train less senior members of the teamAssist with product/technology selection including evaluating maturity, support and design and implementation of POCsWork with the Director, Site Reliability Engineering to foster a culture of learning and continuous improvement, help to conceptualize and visualize workflows and processesPerform post-incident analysis to identify root causes and potential workarounds/solutionsBe fluid and open to change and evolving processes and toolsOther duties as assigned
WHAT YOU WILL NEEDExpert with EKS, Kubernetes and AWS including IAM, autoscaling, networking and load balancing/request routingProven experience with solving scalability problems both up and down including thundering herd scenariosExpert with troubleshooting and root cause analysisExpert with at least 2 programming languagesStrong analytical skillsStrong communication skills, both verbal and writtenProven experience with building deployment pipelines and enabling self-serviceStrong teamwork and willingness to collaborate with othersProven experience with training and mentoring engineers