CBTS
CBTS serves enterprise and midmarket clients in all industries across the United States and Canada. CBTS combines deep technical expertise with a full suite of flexible technology solutions--including Application Modernization, Managed Hybrid Cloud, Cybersecurity, Unified Communications, and Infrastructure solutions. From developing and deploying modern applications and the secure, scalable platforms on which they run, to managing, monitoring, and optimizing their operations, CBTS delivers comprehensive technology solutions for its clients' transformative business initiatives. For more information, please visit www.cbts.com .
OnX is a leading technology solution provider that serves businesses, healthcare organizations, and government agencies across Canada. OnX combines deep technical expertise with a full suite of flexible technology solutions—including Generative AI, Application Modernization, Managed Hybrid Cloud, Cybersecurity, Unified Communications, and Infrastructure solutions. From developing and deploying modern applications and the secure, scalable platforms on which they run, to managing, monitoring, and optimizing their operations, OnX delivers comprehensive technology solutions for its clients’ transformative business initiatives. For more information, please visit
www.onx.com
. Job Title:
Senior Site Reliability Engineer (SRE) – Splunk Specialist
Location:
Remote Experience:
6+ years Employment Type:
Full-time Role Overview:
We are seeking a
Senior Site Reliability Engineer (SRE)
with strong experience in
Splunk
to ensure the reliability, scalability, and performance of our systems. The ideal candidate will design and implement monitoring solutions, automate operational tasks, and collaborate with development teams to improve system resilience and observability. Key Responsibilities:
Design, implement, and maintain
Splunk dashboards, alerts, and reports
for system monitoring and incident management. Develop and optimize
observability solutions
for infrastructure and applications. Automate operational processes using
scripting and configuration management tools
. Collaborate with development and operations teams to
improve system reliability and performance
. Troubleshoot and resolve
production issues
, ensuring minimal downtime. Implement
incident response and root cause analysis
processes. Drive
capacity planning, performance tuning, and scalability improvements
. Ensure compliance with
security and governance standards
. Required Skills & Qualifications:
Strong experience with
Splunk
(configuration, dashboard creation, alerting, log analysis). Proficiency in
Linux/Unix systems administration
. Hands-on experience with
cloud platforms
(AWS, Azure, or GCP). Strong scripting skills in
Python, Shell, or similar languages
. Familiarity with
CI/CD pipelines
and automation tools (Ansible, Terraform, Jenkins). Knowledge of
monitoring and observability tools
(Prometheus, Grafana, ELK). Excellent troubleshooting and problem-solving skills. Preferred Skills:
Experience with
containerization and orchestration
(Docker, Kubernetes). Exposure to
incident management frameworks
(ITIL, SRE best practices). Understanding of
security monitoring and compliance
. Education:
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
#J-18808-Ljbffr
www.onx.com
. Job Title:
Senior Site Reliability Engineer (SRE) – Splunk Specialist
Location:
Remote Experience:
6+ years Employment Type:
Full-time Role Overview:
We are seeking a
Senior Site Reliability Engineer (SRE)
with strong experience in
Splunk
to ensure the reliability, scalability, and performance of our systems. The ideal candidate will design and implement monitoring solutions, automate operational tasks, and collaborate with development teams to improve system resilience and observability. Key Responsibilities:
Design, implement, and maintain
Splunk dashboards, alerts, and reports
for system monitoring and incident management. Develop and optimize
observability solutions
for infrastructure and applications. Automate operational processes using
scripting and configuration management tools
. Collaborate with development and operations teams to
improve system reliability and performance
. Troubleshoot and resolve
production issues
, ensuring minimal downtime. Implement
incident response and root cause analysis
processes. Drive
capacity planning, performance tuning, and scalability improvements
. Ensure compliance with
security and governance standards
. Required Skills & Qualifications:
Strong experience with
Splunk
(configuration, dashboard creation, alerting, log analysis). Proficiency in
Linux/Unix systems administration
. Hands-on experience with
cloud platforms
(AWS, Azure, or GCP). Strong scripting skills in
Python, Shell, or similar languages
. Familiarity with
CI/CD pipelines
and automation tools (Ansible, Terraform, Jenkins). Knowledge of
monitoring and observability tools
(Prometheus, Grafana, ELK). Excellent troubleshooting and problem-solving skills. Preferred Skills:
Experience with
containerization and orchestration
(Docker, Kubernetes). Exposure to
incident management frameworks
(ITIL, SRE best practices). Understanding of
security monitoring and compliance
. Education:
Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
#J-18808-Ljbffr