
Site Reliability Engineer (Dallas)
MM International, LLC, Dallas, TX, United States
Site Reliability Engineer (Contract-to-Hire) (Onsite Interview)
Location:
Dallas, TX (Hybrid/Onsite – Local Only)
Duration:
3-Month Contract to Hire
Interview:
Onsite
Schedule:
Mon-Fri, 8 AM–5 PM PST
Job Description:
Seeking an experienced
Site Reliability Engineer (SRE)
with 7+ years of SRE experience and strong production engineering background. Candidate should have hands-on experience in
incident management, on-call support, RCA, automation, observability, and infrastructure reliability .
Required Skills:
Strong experience with
Azure, Kubernetes, Docker
CI/CD using
GitHub Actions
Monitoring/Observability tools ( Dynatrace preferred )
Automation using
Ansible, Python, Bash
Support of
Java applications
in production
Linux and Windows administration
Strong understanding of
SLIs, SLOs, Error Budgets
Experience leading or contributing to major production incidents
Responsibilities:
Manage and improve system reliability, scalability, and performance
Support production environments and participate in on-call rotation
Drive incident response, root cause analysis, and corrective actions
Build automation to reduce operational toil
Enhance observability, monitoring, and operational reporting
Collaborate with engineering teams on reliability improvements
Location:
Dallas, TX (Hybrid/Onsite – Local Only)
Duration:
3-Month Contract to Hire
Interview:
Onsite
Schedule:
Mon-Fri, 8 AM–5 PM PST
Job Description:
Seeking an experienced
Site Reliability Engineer (SRE)
with 7+ years of SRE experience and strong production engineering background. Candidate should have hands-on experience in
incident management, on-call support, RCA, automation, observability, and infrastructure reliability .
Required Skills:
Strong experience with
Azure, Kubernetes, Docker
CI/CD using
GitHub Actions
Monitoring/Observability tools ( Dynatrace preferred )
Automation using
Ansible, Python, Bash
Support of
Java applications
in production
Linux and Windows administration
Strong understanding of
SLIs, SLOs, Error Budgets
Experience leading or contributing to major production incidents
Responsibilities:
Manage and improve system reliability, scalability, and performance
Support production environments and participate in on-call rotation
Drive incident response, root cause analysis, and corrective actions
Build automation to reduce operational toil
Enhance observability, monitoring, and operational reporting
Collaborate with engineering teams on reliability improvements