
Site Reliability Engineer ID53670
AgileEngine, LLC., New York, NY, United States
ABOUT THE ROLE
We are looking for a Middle SRE Operations Engineer to maintain reliability across a cloud-based SaaS platform. You’ll handle live incidents, improve observability, and reduce toil through automation using Kubernetes, Terraform, Grafana, and AWS. Hands-on, execution-focused, with real ownership across CI/CD pipelines, GitOps workflows, and on-call rotations.
WHAT YOU WILL DO
Monitor and support production and staging environments to ensure availability, performance, and stability;
Respond to incidents, perform triage and root cause analysis, and contribute to remediation efforts;
Participate in on-call rotations with defined SLAs;
Handle operational requests from internal teams;
Maintain and improve monitoring, alerting, dashboards, logs, and metrics;
Support CI/CD pipelines, production releases, and GitOps workflows;
Contribute to automation initiatives to reduce operational overhead;
Maintain and improve Kubernetes-based infrastructure and containerized workloads;
Support Infrastructure as Code practices and environment improvements.
MUST HAVES
2+ years of experience
in Site Reliability Engineering, DevOps, or Production Operations;
Experience with
AWS
supporting production environments;
Experience supporting
production SaaS applications ;
Strong understanding of
CI/CD systems
(GitHub Actions, Jenkins, CircleCI);
Experience with
GitOps and Git fundamentals ;
Experience using
GitHub, Jira, and Confluence ;
Experience with
Kubernetes
(EKS, kOps or similar);
Experience with
Docker and containerization ;
Experience with
observability tools
(Grafana, Prometheus, Loki, PagerDuty);
Proficiency in
scripting
(Bash, Python, or Go);
Experience with
Infrastructure as Code
(Terraform, Helm);
Ability to work within structured operational processes and SLAs;
Strong written and verbal English communication skills;
Self-driven with a growth mindset.
NICE TO HAVES
AWS certifications such as Solutions Architect, DevOps Engineer, or SysOps Administrator;
Experience with multi-tenant SaaS environments;
Experience working in globally distributed teams;
Familiarity with ChatOps practices;
Experience improving monitoring quality and reducing alert fatigue.
PERKS AND BENEFITS
Professional growth:
Mentorship, TechTalks, and personalized growth roadmaps.
Competitive compensation:
USD-based pay with education, fitness, and team activity budgets.
Exciting projects:
Modern solutions with Fortune 500 and top product companies.
Flextime:
Flexible schedule with remote and office options.
#J-18808-Ljbffr
We are looking for a Middle SRE Operations Engineer to maintain reliability across a cloud-based SaaS platform. You’ll handle live incidents, improve observability, and reduce toil through automation using Kubernetes, Terraform, Grafana, and AWS. Hands-on, execution-focused, with real ownership across CI/CD pipelines, GitOps workflows, and on-call rotations.
WHAT YOU WILL DO
Monitor and support production and staging environments to ensure availability, performance, and stability;
Respond to incidents, perform triage and root cause analysis, and contribute to remediation efforts;
Participate in on-call rotations with defined SLAs;
Handle operational requests from internal teams;
Maintain and improve monitoring, alerting, dashboards, logs, and metrics;
Support CI/CD pipelines, production releases, and GitOps workflows;
Contribute to automation initiatives to reduce operational overhead;
Maintain and improve Kubernetes-based infrastructure and containerized workloads;
Support Infrastructure as Code practices and environment improvements.
MUST HAVES
2+ years of experience
in Site Reliability Engineering, DevOps, or Production Operations;
Experience with
AWS
supporting production environments;
Experience supporting
production SaaS applications ;
Strong understanding of
CI/CD systems
(GitHub Actions, Jenkins, CircleCI);
Experience with
GitOps and Git fundamentals ;
Experience using
GitHub, Jira, and Confluence ;
Experience with
Kubernetes
(EKS, kOps or similar);
Experience with
Docker and containerization ;
Experience with
observability tools
(Grafana, Prometheus, Loki, PagerDuty);
Proficiency in
scripting
(Bash, Python, or Go);
Experience with
Infrastructure as Code
(Terraform, Helm);
Ability to work within structured operational processes and SLAs;
Strong written and verbal English communication skills;
Self-driven with a growth mindset.
NICE TO HAVES
AWS certifications such as Solutions Architect, DevOps Engineer, or SysOps Administrator;
Experience with multi-tenant SaaS environments;
Experience working in globally distributed teams;
Familiarity with ChatOps practices;
Experience improving monitoring quality and reducing alert fatigue.
PERKS AND BENEFITS
Professional growth:
Mentorship, TechTalks, and personalized growth roadmaps.
Competitive compensation:
USD-based pay with education, fitness, and team activity budgets.
Exciting projects:
Modern solutions with Fortune 500 and top product companies.
Flextime:
Flexible schedule with remote and office options.
#J-18808-Ljbffr