Mediabistro logo
job logo

Senior Manager, DevOps & SRE – Platform Reliability & Global Operations

Qcells North America, San Francisco, CA, United States


Position Description Senior DevOps & SRE Manager – Platform Reliability & Global Operations is a senior technical leader responsible for the reliability, scalability, security, and operational excellence of a complex, multi‑platform ecosystem spanning applications, workflows, event streaming, and data platforms.

Location & Work Arrangement Candidates must be able to work primarily within Pacific or Central Time Zone business hours to support collaboration with global teams.

Employees located within 50 miles of a Qcells office (e.g., Irvine, San Francisco, Houston, or South Carolina locations) are expected to follow the company’s hybrid work policy of at least three in‑office days per week.

Responsibilities

Lead and scale a global, multi‑tier (L1/L2/L3) DevOps and SRE organization

Design and operate follow‑the‑sun on‑call and support models

Own incident management, including Sev‑1/Sev‑2 incident command and executive communication

Define and operate SLOs, SLIs, and error budgets across apps, workflows, events, and data pipelines

Oversee DevOps practices for CI/CD, Kubernetes, IaC, automation, and cost optimization

Ensure reliable operation of event‑driven and telemetry pipelines

Govern and manage third‑party DevOps and SRE vendors, including SLAs and escalations

Drive operational maturity: post‑mortems, automation, reliability improvements

Partner with security on secure operations, incident response, and compliance readiness

Platforms in Scope

Application Platforms: Kubernetes, containerized, EMS telemetry & control

Workflow Orchestration: Fleet Manager, Power Automate, cross‑system workflows

Event & Streaming: Microsoft Event Hub, event streams, Kafka, RabbitMQ

Data & Telemetry: Microsoft Fabric, Kusto, PostgreSQL, TimescaleDB, Cassandra

CI/CD & Infrastructure: GitHub Actions, Jenkins, Terraform, Helm, Ansible (Azure & AWS)

IAM across Azure and AWS

Experience with SalesForce, Snowflake preferred

Technical Strengths

Kubernetes and container platforms in production

Azure (required), AWS

Event streaming and messaging systems

Data pipelines and telemetry platforms

Power pages, Power Automate

CI/CD, Infrastructure as Code, and automation

Observability and incident troubleshooting at scale

Operational Expectations

Escalation management for on‑call and major incidents

Willingness to work off‑hours when required

Comfortable making high‑impact decisions under pressure

Required Qualifications

15+ years in DevOps, SRE, Platform Engineering, or Production Operations

5+ years leading globally distributed engineering teams

Proven ownership of 24x7, mission‑critical production platforms

Strong experience managing third‑party vendors/managed service providers

Deep hands‑on experience with Kubernetes, cloud platforms, and event‑driven systems

Preferred Qualifications

Solar industry experience (Renewable)

Use of AI Tools Qcells expects team members to leverage AI models and AI‑assisted tools in their daily workflows where appropriate. Candidates should be comfortable working in an AI‑augmented environment and applying sound judgment when using AI‑generated outputs. During the interview process, candidates will be asked to share examples of how they have used AI tools or models in their work.

Salary Range The salary range is required by the California Pay Transparency Act and may differ depending on the location of those candidates hired nationwide. Actual compensation is influenced by a wide array of factors including but not limited to skill set, education, licenses and certifications, essential job duties and requirements, and the necessary experience relative to the job’s minimum qualifications.

This target salary range is for CA positions only and should not be interpreted as an offer of compensation.

#J-18808-Ljbffr