Mediabistro logo
job logo

Production Engineer (Java & AWS Cloud Infrastructure)

Kaygen, Plano, TX, USA

Job type: Full Time


Only Citizen and Green Card preferred.
We are seeking a highly skilled

Production Engineer

to bridge the gap between application development and system operations. In this role, you will use your software engineering background to ensure our core platforms are highly available, scalable, and resilient.
You won't just monitor servers—you will dive directly into application code written in

Java and Spring Boot

to debug bottlenecks, automate infrastructure deployment on

AWS , and optimize production performance. If you approach operational challenges as software problems, we want you on our team.
Key Responsibilities:
Production Operations & Reliability:

Own end-to-end production environments. Lead incident response, conduct Root Cause Analysis (RCA), and optimize systems to meet strict SLA/SLO and MTTR targets.
Infrastructure as Code (IaC):

Treat infrastructure as software by writing clean, reusable

Terraform

or CloudFormation modules to automate cloud provisioning and eliminate manual drift.
Scalable Systems Architecture:

Partner with dev teams to architect fault-tolerant, cloud-native microservices utilizing automated failover, autoscaling, and traffic routing.
Continuous Delivery Automation:

Build, scale, and maintain robust CI/CD pipelines (Jenkins, GitLab CI, or AWS CodePipeline) to streamline automated testing and deployments.
Observability & Performance Tuning:

Design and manage centralized monitoring and distributed tracing stacks using

Prometheus, Grafana , AWS CloudWatch, and Jaeger/X-Ray to catch issues before they impact users.
Production Security:

Implement and enforce enterprise-grade security controls, including AWS IAM roles, OAuth2, JWT, and data encryption.
Required Skills & Qualifications:
Experience:

3–5 years of dedicated experience in Production Engineering, Site Reliability Engineering (SRE), or DevOps.
Backend Engineering:

Strong proficiency in

Java and Spring Boot

with the ability to read, trace, and debug complex microservice applications.
AWS & Containerization:

Hands-on experience with core cloud infrastructure, specifically

Docker, Kubernetes (EKS/ECS) , Lambda, SQS, SNS, and Application Load Balancers (ALB).
Automation:

Practical experience using

Terraform

for cloud infrastructure automation and scripting.
Telemetry Stack:

Deep practical knowledge of

Prometheus and Grafana

or AWS CloudWatch for real-time visibility.
Environment:

Comfortable working in fast-paced Agile/Scrum environments and participating in production on-call rotations.
What Will Make You Stand Out
Proven track record of migrating legacy monoliths into cloud-native microservices.
Experience running cost-optimization and cloud-resource rightsizing initiatives.
A metric-driven mindset focused on improving system uptime and reducing operational overhead.