Director of Platform Engineering & Operations -- MAZDC5764907

Compunnel Inc., Charlotte, NC, United States

Role Summary

The Director of Platform Engineering & Operations is responsible for COMPANY’s entire technology platform—overseeing both customer-facing systems and internal infrastructure to ensure 24x7 availability, security, and scalability across Azure cloud and on-premise environments. This hands‑on leadership role balances technical execution and strategic management, including building a high-performing team, driving operational excellence, implementing security controls, and supporting the company’s rapid growth.

Key Responsibilities

Build, mentor, and retain a team of 8 engineers across infrastructure/network, DevOps/SRE, and desktop/end‑user support, providing technical coaching, career development, and performance management.
Own platform strategy, roadmap, and execution to meet business goals and customer SLAs.
Define and track operational KPIs (availability, MTTR, change success rate, incident volume, cloud cost efficiency) and present regular updates to the CTO and executive team.
Take full ownership of platform strategy, roadmap, and execution aligned with business objectives, product needs, and customer SLAs.
Establish operational cadence: incident reviews, change advisory board, service desk metrics, team retrospectives, and continuous improvement culture.
Own the design, implementation, and 24x7 operation of COMPANY’s hybrid infrastructure (Azure + on‑premise) supporting both production and internal corporate systems.
Ensure high availability, scalability, performance, security, and cost efficiency across all environments.
Hands‑on architecture and implementation of cloud infrastructure, networking, identity management (Azure AD/Entra, RBAC), storage, backup, monitoring, and observability.
Drive cloud optimization initiatives: rightsizing, reserved capacity, architectural improvements, and cost governance across Azure workloads.
Define and enforce platform standards for networking, security, identity, logging, alerting, and operational discipline.

DevOps & Site Reliability Engineering

Lead DevOps and SRE transformation: implement CI/CD pipelines, Infrastructure as Code (Terraform, ARM/Bicep), containerization (Kubernetes), and modern deployment practices.
Hands‑on implementation of Kubernetes clusters, container orchestration, service mesh, and cloud‑native architecture patterns.
Establish SRE principles: error budgets, SLOs/SLIs, blameless postmortems, observability (metrics/logs/traces), and reliability engineering culture.
Build and optimize CI/CD tooling and workflows to improve release velocity, reduce deployment risk, and increase developer productivity.
Implement robust change management processes (risk assessment, testing, communication, rollback procedures) that balance speed, safety, and audit readiness.

Information Security & Compliance

Implement security and compliance controls, including access management, logging and monitoring, vulnerability management, incident response, and audit evidence collection.
Establish security best practices across infrastructure: network segmentation, firewall rules, encryption (data at rest/in transit), secrets management, privileged access management.
Lead incident response for infrastructure and platform issues, including root cause analysis, remediation, and process improvements.
Own Disaster Recovery strategy and execution: define RPO/RTO targets, architect multi‑region and hybrid DR solutions, develop runbooks, and conduct regular DR testing.
Ensure backup and restore capabilities across all critical systems with documented procedures and validated recovery processes.

Desktop & End‑User Support

Oversee desktop, endpoint, and telecom services (laptops, mobile devices, productivity tools, collaboration platforms, voice/conferencing) to deliver reliable, secure employee experiences.
Implement IT service management practices (incident, request, problem, asset management) with clear SLAs and user satisfaction metrics.
Manage vendor relationships across infrastructure, telecom, SaaS, and managed services—evaluate contracts, optimize licensing, and ensure service quality.

Required Qualifications

10+ years of progressive experience in IT infrastructure and operations, with at least 3–5 years in a leadership role managing teams delivering hybrid cloud environments.
Deep expertise with Microsoft Azure including compute (VMs, App Services, Functions), networking (VNets, NSGs, load balancers), identity (Azure AD/Entra, RBAC), security, monitoring, and cost management.
Proven track record architecting and operating highly available, mission‑critical systems supporting 24x7 customer‑facing platforms at enterprise scale.
Strong background in security and compliance, with experience implementing controls.
Demonstrated leadership of DevOps/SRE teams with hands‑on experience building CI/CD pipelines, managing Kubernetes clusters, implementing Infrastructure as Code (Terraform, ARM/Bicep), and operating observability platforms.
Solid understanding and ownership of change management processes (ITIL or similar) including change advisory boards, risk assessment, and audit‑ready documentation.
Hands‑on experience designing and executing Disaster Recovery strategies in cloud and data center environments, including DR testing and runbook development.
Experience overseeing desktop/end‑user support and telecom services in a growing, distributed organization.
Proven ability to recruit, develop, and retain high‑performing technical teams with a coaching‑oriented leadership style.
Excellent communication and stakeholder management skills—ability to translate technical complexity into business impact for executive and non‑technical audiences.
Thrives in fast‑paced, dynamic environments with rapidly changing priorities and ambiguity.
Strong ownership mentality: you take accountability for outcomes, drive issues to resolution, and lead by example.

Preferred Skills

Experience in B2B SaaS, telematics, fleet management, IoT, or other real‑time, data‑intensive platforms serving enterprise customers.
Familiarity with ITSM tools (Jira Service Management, ServiceNow), configuration management databases (CMDB), and IT asset management practices.
Experience with observability and monitoring platforms (Datadog, New Relic, Prometheus/Grafana, Azure Monitor, Application Insights).
Background supporting real‑time GPS tracking, vehicle telematics, or IoT device management platforms.
Relevant certifications: Microsoft Certified: Azure Solutions Architect Expert, Azure Administrator Associate, CISSP, CISM, ITIL Foundation or higher.
Experience scaling infrastructure to support rapid business growth (2×–3× revenue in 2–3 years).
Prior experience operating in regulated or compliance‑driven environments (SOC 2, ISO 27001, HIPAA, FedRAMP).
Hands‑on experience with Azure Kubernetes Service (AKS), Azure DevOps, GitHub Actions, or similar CI/CD platforms.
Understanding of fleet management industry compliance requirements (FMCSA, ELD mandates, hours‑of‑service regulations).

#J-18808-Ljbffr