Mediabistro logo
job logo

Production Engineer

US Tech Solutions, Plano, TX, USA

Job type: Full Time


Responsibilities:
3-4 years of experience in production engineering and site reliability engineering (SRE) to design, implement, and maintain highly available, scalable, and resilient systems.
Own end-to-end operational responsibilities include monitoring, incident response, root cause analysis, capacity planning, and automation to ensure optimal system performance and reliability in production environments.
Collaborate cross-functionally with development, QA, and infrastructure teams to streamline CI/CD pipelines, automate deployments, and enforce best practices for security, compliance, and disaster recovery.
Utilize a broad set of tools and technologies to proactively detect, troubleshoot, and resolve production issues, minimizing downtime and improving service-level objectives (SLOs) and service-level agreements (SLAs).
Java, JavaScript, Cloud-based Microservices, Spring Boot, AWS
Build, deploy, and maintain cloud-native microservices using Java, Spring Boot, and JavaScript frameworks, ensuring high availability and scalability.
Design and implement RESTful APIs and event-driven architectures using AWS services such as Lambda, ECS/EKS, SQS, and SNS.
Develop and maintain CI/CD pipelines with Jenkins, GitLab CI, or AWS CodePipeline for automated testing and deployment.
Monitor application and infrastructure health using AWS CloudWatch, Prometheus, Grafana, and distributed tracing tools like Jaeger or AWS X-Ray.
Troubleshoot production issues, perform root cause analysis, and implement fixes to improve system reliability.
Implement security controls including IAM roles, OAuth2, JWT, and encryption for data in transit and at rest.
Collaborate with cross-functional teams to design fault-tolerant, resilient systems with automated failover and recovery.
Optimize cloud resource usage and cost through rightsizing and autoscaling configurations.
Automate operational tasks and incident response using scripting and infrastructure as code (Terraform, CloudFormation).
Maintain detailed documentation of system architecture, deployment processes, and operational runbooks.