Sr. Production Engineer (Frontend)

ektello · New York, NY, USA · 2 months ago

Pay:: $85-$100/hr
Job type:: Full Time

Sr Production Engineer
Duration:

12 Months (with possibility of extension based on performance and business needs.)

Location:

Remote: US (Occasional travel to Yahoo offices Sunnyvale, CA or New York, NY)

Coding test:

Required

Pay:

$85-$100/Hr

Role Fit
This role sits within Yahoo Mail's Production Engineering. Engineers in this role directly support cloud infrastructure reliability, cost efficiency, and automation for one of the world's largest consumer email platforms, serving hundreds of millions of users globally.

Overview Of The Team
Yahoo Mail Production Engineering manages GCP-based infrastructure including GKE clusters, Compute Engine, Dataproc, Vertex AI and more GCP services. The team is responsible for production reliability, capacity planning, cost optimization, CI/CD pipelines, MLOPS, and infrastructure-as-code across 40+ GCP projects on an extra large, petabyte data size scale. We work in close collaboration with software architects, developers and product managers to deliver end to end results.

Primary responsibilities (daily/weekly)

Operate, monitor, and improve GKE apps, Analytics, and ML production workloads

Manage Terraform/Ansible/Helm IaC for GCP resource provisioning and policy enforcement

Participate in on-call rotation for production incidents

Review and improve CI/CD pipelines for services deployed in Python, Node.js, and Java

Collaborate with architects and developers on infrastructure architecture and design

Automate cloud operations through programmable and secure solutions

Leverage AI-driven tools for development agents, troubleshooting, and automation

Key projects or initiatives for the role

On-prem to GCP migration of large-scale Yahoo Mail workloads

Analyti- Analytics pipeline and reliability improvements platform work (Vertex AI, Generative AI, BigQuery, Looker, Dataproc)

Success metrics or KPIs for this role

On-call incident resolution time and escalation rate (MTTD, MTTR, MTTE)

Terraform/IaC coverage of managed resources

CI/CD pipeline reliability and deployment velocity

Progress on on-prem to GCP migration milestones

Sprint goal achievement (SMART goals per sprint)

Technical (Required)

5+ years in SRE, DevOps, Infrastructure, or Cloud Operations with on-call duties

GCP services proficiency: GKE, GCE, Networking, Security, CI/CD, and common cloud technologies

IaC proficiency: Terraform, Ansible, and Helm Charts

Programming in Python, Node.js, and Java; ability to build CI/CD pipelines in these languages

Linux, TCP/IP, HTTP, mail protocols, DNS, CDN, load balancers, and troubleshooting

Experience with large-scale production applications, systems, and networks

Technical (Advantageous)

Cloud databases and storage: GCS, Cloud SQL, Spanner, Memorystore

ML/AI platforms: Vertex AI, Generative AI, BigQuery, Looker, Dataproc

Cloud Observability and OpenTelemetry

Proven track record migrating on-prem infrastructure to GCP

Operational experience in both on-prem and cloud environments

Ideal experience level (years, leadership, industries)
5+ years total cloud/SRE experience, with preference for GCP. Experience at large-scale internet companies with petabytes level data production systems is strongly preferred.

#J-18808-Ljbffr