CBTS

Senior SRE

CBTS, Nashville, Tennessee, United States

CBTS serves enterprise and midmarket clients in all industries across the United States and Canada. CBTS combines deep technical expertise with a full suite of flexible technology solutions--including Application Modernization, Managed Hybrid Cloud, Cybersecurity, Unified Communications, and Infrastructure solutions. From developing and deploying modern applications and the secure, scalable platforms on which they run, to managing, monitoring, and optimizing their operations, CBTS delivers comprehensive technology solutions for its clients' transformative business initiatives. For more information, please visit www.cbts.com . OnX is a leading technology solution provider that serves businesses, healthcare organizations, and government agencies across Canada. OnX combines deep technical expertise with a full suite of flexible technology solutions—including Generative AI, Application Modernization, Managed Hybrid Cloud, Cybersecurity, Unified Communications, and Infrastructure solutions. From developing and deploying modern applications and the secure, scalable platforms on which they run, to managing, monitoring, and optimizing their operations, OnX delivers comprehensive technology solutions for its clients’ transformative business initiatives. For more information, please visit

www.onx.com

. Job Title:

Senior Site Reliability Engineer (SRE) – Splunk Specialist

Location:

Remote Experience:

6+ years Employment Type:

Full-time Role Overview:

We are seeking a

Senior Site Reliability Engineer (SRE)

with strong experience in

Splunk

to ensure the reliability, scalability, and performance of our systems. The ideal candidate will design and implement monitoring solutions, automate operational tasks, and collaborate with development teams to improve system resilience and observability. Key Responsibilities:

Design, implement, and maintain

Splunk dashboards, alerts, and reports

for system monitoring and incident management. Develop and optimize

observability solutions

for infrastructure and applications. Automate operational processes using

scripting and configuration management tools

. Collaborate with development and operations teams to

improve system reliability and performance

. Troubleshoot and resolve

production issues

, ensuring minimal downtime. Implement

incident response and root cause analysis

processes. Drive

capacity planning, performance tuning, and scalability improvements

. Ensure compliance with

security and governance standards

. Required Skills & Qualifications:

Strong experience with

Splunk

(configuration, dashboard creation, alerting, log analysis). Proficiency in

Linux/Unix systems administration

. Hands-on experience with

cloud platforms

(AWS, Azure, or GCP). Strong scripting skills in

Python, Shell, or similar languages

. Familiarity with

CI/CD pipelines

and automation tools (Ansible, Terraform, Jenkins). Knowledge of

monitoring and observability tools

(Prometheus, Grafana, ELK). Excellent troubleshooting and problem-solving skills. Preferred Skills:

Experience with

containerization and orchestration

(Docker, Kubernetes). Exposure to

incident management frameworks

(ITIL, SRE best practices). Understanding of

security monitoring and compliance

. Education:

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.

#J-18808-Ljbffr