Site Reliability Engineer, Datadog Specialist

S&P Global, Englewood, CO, United States

Site Reliability Engineer - Datadog Specialist

Grade Level (for internal use):

09
The Team:

The IT Operations team at

S&P Dow Jones Indices

owns and operates the production systems that power S&P DJI's global index platforms. Our focus is reliability, visibility, and operational excellence, ensuring critical market-facing services remain available, observable, and resilient.
Responsibilities and Impact

This role sits at the intersection of Site Reliability Engineering and Observability, focused on the hands‑on implementation and operation of enterprise telemetry platforms. The position supports application, infrastructure, and production support teams by ensuring systems are well-instrumented, observable, and diagnosable in production environments.
We are seeking a hands‑on Observability Engineer with strong experience using Datadog and modern telemetry tools. This is not a general DevOps or platform engineering role; it is a tool‑focused position responsible for implementing, operating, and continuously improving observability across applications, databases, and infrastructure within an established SRE framework.
Own and Evolve End-to-End Observability Using Datadog

APM, Distributed Tracing, DBM
Log ingestion, parsing, pipelines, and correlation
Synthetic monitoring, RUM (where applicable)
AI-driven alerting, Watchdog, and anomaly detection
Design and Enforce Monitoring Standards

Alert quality, signal‑to‑noise reduction
Golden signals, SLO/SLA-aligned monitoring
Consistent tagging, naming, and telemetry hygiene
Serve as the Primary Datadog Platform Specialist

Dashboards, monitors, service catalog, integrations
Cost visibility and optimization of logs/APM/DBM usage
Enablement and onboarding of application teams
Support Production Incident Response

Use Datadog, Splunk, and logs to triage incidents
Lead or support root‑cause analysis and post‑incident reviews
Improve observability gaps identified during incidents
Integrate telemetry with other ITSM tools such as ServiceNow & PagerDuty to support incident and change workflows.
Partner with Engineering Teams to

Improve instrumentation (APM, custom metrics, logs)
Adopt OpenTelemetry where appropriate
Validate observability during releases and changes
Participate in DR testing, operational readiness reviews, and continuous improvement of SRE/IT Ops practices.
Compensation/Benefits Information

S&P Global states that the anticipated base salary range for this position is $90,000 to $122,000. Final base salary for this role will be based on the individual’s geographic location, as well as experience level, skill set, training, licenses and certifications.
In addition to base compensation, this role is eligible for an annual incentive plan. This role is not eligible for additional compensation such as an annual incentive bonus or sales commission plan.
This role is eligible to receive additional S&P Global benefits. For more information on the benefits we provide to our employees, please see https://spgbenefits.com/benefit-summaries/us.
Benefits

Health & Wellness: Health care coverage designed for the mind and body.
Flexible Downtime: Generous time off helps keep you energized for your time on.
Continuous Learning: Access a wealth of resources to grow your career and learn valuable new skills.
Invest in Your Future: Secure your financial future through competitive pay, retirement planning, a continuing education program with a company-matched student loan contribution, and financial wellness programs.
Family Friendly Perks: It’s not just about you. S&P Global has perks for your partners and little ones, too, with some best-in-class benefits for families.
Beyond the Basics: From retail discounts to referral incentive awards—small perks can make a big difference.
Basic Required Qualifications

4+ years of experience in Observability, SRE, or Production Operations roles
Strong, hands-on Datadog experience: APM, logs, DBM, dashboards, monitors, integrations
Experience working with telemetry concepts: Metrics, logs, traces, log correlation, distributed tracing
Working knowledge of AWS environments (EC2, ECS, RDS, S3, DynamoDB etc)
Ability to read and reason about application code (Java and/or Python) to support instrumentation, troubleshooting, and telemetry design (this is not a feature-development role).
Experience integrating monitoring tools with PagerDuty and ServiceNow
Strong troubleshooting, documentation, and communication skills
Additional Preferred Qualifications

Datadog certifications (APM, Logs, Fundamentals)
Exposure to Splunk, ELK, Dynatrace, or similar tools
Experience with OpenTelemetry (instrumentation or collectors)
Familiarity with CI/CD pipelines and containerized workloads
Experience supporting mission-critical, high-availability systems
Financial services, index, or data-platform experience
Location:

This role can be hybrid 2-3 days a week at most of our U.S. based offices and is a requirement of the position.
Equal Opportunity Employer

S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment.
If you need an accommodation during the application process due to a disability, please send an email to: EEO.Compliance@spglobal.com and your request will be forwarded to the appropriate person.
US Candidates Only: The EEO is the Law Poster http://www.dol.gov/ofccp/regs/compliance/posters/pdf/eeopost.pdf describes discrimination protections under federal law. Pay Transparency Nondiscrimination Provision - https://www.dol.gov/sites/dolgov/files/ofccp/pdf/pay-transp_ English_formattedESQA508c.pdf

#J-18808-Ljbffr