MX Technologies, Inc.
Senior Director - Cloud Platform Engineering
MX Technologies, Inc., Lehi, Utah, United States, 84043
**Life at MX**
We are driven by our moral imperative to advance mankind - and it all starts with our people, product and purpose. We always carry a deep sense of drive and passion with us. If you thrive in a challenging work environment, surrounded by incredible team members who will help you grow, MX is the right place for you.
Come build with us and be part of an award-winning company that’s helping create meaningful and lasting change in the financial industry.This role is personally accountable for the production reliability and stability, including owning US time-zone incidents, Sev 0/1 events, leading cutovers, and directly representing site, infrastructure and platforms to executive leadership during high-impact events. The expectation is that this leader stands front of the line during critical incidents and events like migration and stabilization, makes real-time decisions, and clearly articulates risk, impact, and trade-offs to executives under pressure. This front-line ownership is intentional but transitional. A core measure of success for this role is building the systems, operating model, delegation structure, and a strong leadership bench such that sustained, high-quality operations do not depend on the continuous personal presence of a single leader. The leader is expected to design for leverage: establishing clear ownership, developing managers/leaders, and embedding practices that scale reliability beyond individual heroics.
In parallel, they are expected to lead the full lifecycle of our infrastructure transformation, from data center exit and AWS migration through steady-state cloud operations and platform maturity. Success is measured not just by completing the migration, but by leaving behind a durable operating model with clear delegation, on-call ownership, and predictable executive engagement. The ideal candidate will have personally led large-scale data center exits and Cloud migrations, not just advised or governed them.* Own and evolve the end-to-end incident management lifecycle for infrastructure and platform services, grounded in SRE principles of reliability, learning, and automation.* Define and enforce SLIs, SLOs, and error budgets for platform and infrastructure services, using them to guide operational decisions, release risk, and incident response.* operate on a clear severity framework (Sev 0/1/2) with explicit ownership, escalation paths, and decision rights.* Lead the transition from incident response as heroics to incident prevention by design, embedding reliability, AI,capacity planning, and failure-mode analysis into platform roadmaps and change processes.* Serve as the executive escalation owner for Sev 0 and Sev 1 incidents, personally leading response, trade-off decisions, and executive communications when required, while delegating incident command to empowered leaders to ensure sustained coverage.* Hold clear decision authority under pressure, including the ability to unilaterally halt or roll back changes, trigger failovers/traffic-shifts and disaster recovery actions, reallocate engineering resources in demanding situations, and make go/no-go cutover decisions to protect customers and data escalating to executive leadership when actions materially impact regulatory posture, contractual commitments, or significant financial exposure.* Build and maintain a US-based SRE and incident leadership bench, with multiple leaders capable of acting as Incident Commander, owning executive updates, and coordinating cross-functional response.* Lead through error budgets and reliability signals to drive blameless postmortems, root-cause analysis, and prioritization of systemic fixes over short-term feature velocity.* Own the systematic reduction of operational toil and capacity tax across infrastructure and platform teams, with clear accountability for ensuring reactive work declines as systems mature.* Hold teams accountable to measurable toil and resilience KPIs, such as percentage of engineer time spent on reactive work, on-call interrupt frequency, manual intervention rates, and incident recurrence.* Influence readiness through game days, chaos testing, and migration-specific drills, validating both technical resilience and delegation models under pressure.* Ensure incident management tooling, observability (metrics, logs, traces), and documentation are standardized, well-owned, and continuously improved.* Own the design, scale, and effectiveness of the Cloud Platform Engineering organization, including SRE, cloud infrastructure, and platform engineering teams across geographies.* Build and lead a strong leadership bench, developing senior managers, principal engineers, and architects who can operate independently at scale.* Clearly define delegation, decision rights, and escalation paths so that critical incidents, migrations, and operational responsibilities are owned at the right level.* Drive organizational clarity across charters, roles, responsibilities, and decision rights to reduce friction and increase delivery velocity.* Actively recruit, retain, and develop top-tier infrastructure, SRE, and platform talent, including succession planning for critical roles.* Establish a culture of engineering excellence, reliability, and continuous improvement, grounded in data, post-incident learning, and blameless accountability.* Lead change management during periods of transformation, including data center exit, cloud migration, and operating model shifts.* Foster strong partnerships with product, application engineering, security, and business leaders, ensuring platform teams are seen as strategic enablers and not service providers.* Champion diversity of thought, inclusive leadership, and high team engagement across a growing, global organization.* 15+ years of experience in infrastructure, Cloud, SRE, or platform engineering.* 7+ years leading large engineering organizations (managers of managers or equivalent).* Direct, hands-on leadership of at least one full data center exit and AWS migration, including decommissioning of on-premise infrastructure.* Deep technical expertise in AWS, including VPC networking, EC2, EKS/Kubernetes, RDS/Aurora, S3, IAM, and observability tooling.* Strong experience operating highly available, distributed systems using SRE principles.* Proven ability to lead complex, high-risk infrastructure transformations in production environments.* Expertise in FinOps and cloud cost optimization practices.* Demonstrated ability to drive standards and adoptions across distributed engineering teams without relying on reporting lines.* Skillful operating as a front-line executive leader during critical situations, including migrations, upgrades, DR, incidents, and major production events.At MX, we are a high-performance organization that thrives on trust and results. This role is based in Lehi, Utah, with flexibility for both in-office and remote work. We believe in empowering our team members to deliver exceptional outcomes while taking advantage of our incredible office space when it best supports their work. Our Utah office features onsite perks such as company-paid meals, massage therapists, a sports simulator, gym, mother’s lounge, and meditation room and meaningful interactions with amazing people. We encourage team members to come together in the office to collaborate, kick off key projects, or strategize cross-functionally, fostering connection and innovation.
MX is proudly committed to recruiting and retaining a diverse and inclusive workforce. As an Equal Opportunity Employer, we never discriminate based on race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, military or veteran status, status as an individual with a disability, or other applicable legally protected characteristics. We particularly welcome applications #J-18808-Ljbffr
We are driven by our moral imperative to advance mankind - and it all starts with our people, product and purpose. We always carry a deep sense of drive and passion with us. If you thrive in a challenging work environment, surrounded by incredible team members who will help you grow, MX is the right place for you.
Come build with us and be part of an award-winning company that’s helping create meaningful and lasting change in the financial industry.This role is personally accountable for the production reliability and stability, including owning US time-zone incidents, Sev 0/1 events, leading cutovers, and directly representing site, infrastructure and platforms to executive leadership during high-impact events. The expectation is that this leader stands front of the line during critical incidents and events like migration and stabilization, makes real-time decisions, and clearly articulates risk, impact, and trade-offs to executives under pressure. This front-line ownership is intentional but transitional. A core measure of success for this role is building the systems, operating model, delegation structure, and a strong leadership bench such that sustained, high-quality operations do not depend on the continuous personal presence of a single leader. The leader is expected to design for leverage: establishing clear ownership, developing managers/leaders, and embedding practices that scale reliability beyond individual heroics.
In parallel, they are expected to lead the full lifecycle of our infrastructure transformation, from data center exit and AWS migration through steady-state cloud operations and platform maturity. Success is measured not just by completing the migration, but by leaving behind a durable operating model with clear delegation, on-call ownership, and predictable executive engagement. The ideal candidate will have personally led large-scale data center exits and Cloud migrations, not just advised or governed them.* Own and evolve the end-to-end incident management lifecycle for infrastructure and platform services, grounded in SRE principles of reliability, learning, and automation.* Define and enforce SLIs, SLOs, and error budgets for platform and infrastructure services, using them to guide operational decisions, release risk, and incident response.* operate on a clear severity framework (Sev 0/1/2) with explicit ownership, escalation paths, and decision rights.* Lead the transition from incident response as heroics to incident prevention by design, embedding reliability, AI,capacity planning, and failure-mode analysis into platform roadmaps and change processes.* Serve as the executive escalation owner for Sev 0 and Sev 1 incidents, personally leading response, trade-off decisions, and executive communications when required, while delegating incident command to empowered leaders to ensure sustained coverage.* Hold clear decision authority under pressure, including the ability to unilaterally halt or roll back changes, trigger failovers/traffic-shifts and disaster recovery actions, reallocate engineering resources in demanding situations, and make go/no-go cutover decisions to protect customers and data escalating to executive leadership when actions materially impact regulatory posture, contractual commitments, or significant financial exposure.* Build and maintain a US-based SRE and incident leadership bench, with multiple leaders capable of acting as Incident Commander, owning executive updates, and coordinating cross-functional response.* Lead through error budgets and reliability signals to drive blameless postmortems, root-cause analysis, and prioritization of systemic fixes over short-term feature velocity.* Own the systematic reduction of operational toil and capacity tax across infrastructure and platform teams, with clear accountability for ensuring reactive work declines as systems mature.* Hold teams accountable to measurable toil and resilience KPIs, such as percentage of engineer time spent on reactive work, on-call interrupt frequency, manual intervention rates, and incident recurrence.* Influence readiness through game days, chaos testing, and migration-specific drills, validating both technical resilience and delegation models under pressure.* Ensure incident management tooling, observability (metrics, logs, traces), and documentation are standardized, well-owned, and continuously improved.* Own the design, scale, and effectiveness of the Cloud Platform Engineering organization, including SRE, cloud infrastructure, and platform engineering teams across geographies.* Build and lead a strong leadership bench, developing senior managers, principal engineers, and architects who can operate independently at scale.* Clearly define delegation, decision rights, and escalation paths so that critical incidents, migrations, and operational responsibilities are owned at the right level.* Drive organizational clarity across charters, roles, responsibilities, and decision rights to reduce friction and increase delivery velocity.* Actively recruit, retain, and develop top-tier infrastructure, SRE, and platform talent, including succession planning for critical roles.* Establish a culture of engineering excellence, reliability, and continuous improvement, grounded in data, post-incident learning, and blameless accountability.* Lead change management during periods of transformation, including data center exit, cloud migration, and operating model shifts.* Foster strong partnerships with product, application engineering, security, and business leaders, ensuring platform teams are seen as strategic enablers and not service providers.* Champion diversity of thought, inclusive leadership, and high team engagement across a growing, global organization.* 15+ years of experience in infrastructure, Cloud, SRE, or platform engineering.* 7+ years leading large engineering organizations (managers of managers or equivalent).* Direct, hands-on leadership of at least one full data center exit and AWS migration, including decommissioning of on-premise infrastructure.* Deep technical expertise in AWS, including VPC networking, EC2, EKS/Kubernetes, RDS/Aurora, S3, IAM, and observability tooling.* Strong experience operating highly available, distributed systems using SRE principles.* Proven ability to lead complex, high-risk infrastructure transformations in production environments.* Expertise in FinOps and cloud cost optimization practices.* Demonstrated ability to drive standards and adoptions across distributed engineering teams without relying on reporting lines.* Skillful operating as a front-line executive leader during critical situations, including migrations, upgrades, DR, incidents, and major production events.At MX, we are a high-performance organization that thrives on trust and results. This role is based in Lehi, Utah, with flexibility for both in-office and remote work. We believe in empowering our team members to deliver exceptional outcomes while taking advantage of our incredible office space when it best supports their work. Our Utah office features onsite perks such as company-paid meals, massage therapists, a sports simulator, gym, mother’s lounge, and meditation room and meaningful interactions with amazing people. We encourage team members to come together in the office to collaborate, kick off key projects, or strategize cross-functionally, fostering connection and innovation.
MX is proudly committed to recruiting and retaining a diverse and inclusive workforce. As an Equal Opportunity Employer, we never discriminate based on race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, military or veteran status, status as an individual with a disability, or other applicable legally protected characteristics. We particularly welcome applications #J-18808-Ljbffr