
Director of Platform Engineering & Operations -- MAZDC5764907
Compunnel Inc., Charlotte, NC, United States
Role Summary
The Director of Platform Engineering & Operations is responsible for COMPANY’s entire technology platform—overseeing both customer-facing systems and internal infrastructure to ensure 24x7 availability, security, and scalability across Azure cloud and on-premise environments. This hands‑on leadership role balances technical execution and strategic management, including building a high-performing team, driving operational excellence, implementing security controls, and supporting the company’s rapid growth.
Key Responsibilities
- Build, mentor, and retain a team of 8 engineers across infrastructure/network, DevOps/SRE, and desktop/end‑user support, providing technical coaching, career development, and performance management.
- Own platform strategy, roadmap, and execution to meet business goals and customer SLAs.
- Define and track operational KPIs (availability, MTTR, change success rate, incident volume, cloud cost efficiency) and present regular updates to the CTO and executive team.
- Take full ownership of platform strategy, roadmap, and execution aligned with business objectives, product needs, and customer SLAs.
- Establish operational cadence: incident reviews, change advisory board, service desk metrics, team retrospectives, and continuous improvement culture.
- Own the design, implementation, and 24x7 operation of COMPANY’s hybrid infrastructure (Azure + on‑premise) supporting both production and internal corporate systems.
- Ensure high availability, scalability, performance, security, and cost efficiency across all environments.
- Hands‑on architecture and implementation of cloud infrastructure, networking, identity management (Azure AD/Entra, RBAC), storage, backup, monitoring, and observability.
- Drive cloud optimization initiatives: rightsizing, reserved capacity, architectural improvements, and cost governance across Azure workloads.
- Define and enforce platform standards for networking, security, identity, logging, alerting, and operational discipline.
DevOps & Site Reliability Engineering
- Lead DevOps and SRE transformation: implement CI/CD pipelines, Infrastructure as Code (Terraform, ARM/Bicep), containerization (Kubernetes), and modern deployment practices.
- Hands‑on implementation of Kubernetes clusters, container orchestration, service mesh, and cloud‑native architecture patterns.
- Establish SRE principles: error budgets, SLOs/SLIs, blameless postmortems, observability (metrics/logs/traces), and reliability engineering culture.
- Build and optimize CI/CD tooling and workflows to improve release velocity, reduce deployment risk, and increase developer productivity.
- Implement robust change management processes (risk assessment, testing, communication, rollback procedures) that balance speed, safety, and audit readiness.
Information Security & Compliance
- Implement security and compliance controls, including access management, logging and monitoring, vulnerability management, incident response, and audit evidence collection.
- Establish security best practices across infrastructure: network segmentation, firewall rules, encryption (data at rest/in transit), secrets management, privileged access management.
- Lead incident response for infrastructure and platform issues, including root cause analysis, remediation, and process improvements.
- Own Disaster Recovery strategy and execution: define RPO/RTO targets, architect multi‑region and hybrid DR solutions, develop runbooks, and conduct regular DR testing.
- Ensure backup and restore capabilities across all critical systems with documented procedures and validated recovery processes.
Desktop & End‑User Support
- Oversee desktop, endpoint, and telecom services (laptops, mobile devices, productivity tools, collaboration platforms, voice/conferencing) to deliver reliable, secure employee experiences.
- Implement IT service management practices (incident, request, problem, asset management) with clear SLAs and user satisfaction metrics.
- Manage vendor relationships across infrastructure, telecom, SaaS, and managed services—evaluate contracts, optimize licensing, and ensure service quality.
Required Qualifications
- 10+ years of progressive experience in IT infrastructure and operations, with at least 3–5 years in a leadership role managing teams delivering hybrid cloud environments.
- Deep expertise with Microsoft Azure including compute (VMs, App Services, Functions), networking (VNets, NSGs, load balancers), identity (Azure AD/Entra, RBAC), security, monitoring, and cost management.
- Proven track record architecting and operating highly available, mission‑critical systems supporting 24x7 customer‑facing platforms at enterprise scale.
- Strong background in security and compliance, with experience implementing controls.
- Demonstrated leadership of DevOps/SRE teams with hands‑on experience building CI/CD pipelines, managing Kubernetes clusters, implementing Infrastructure as Code (Terraform, ARM/Bicep), and operating observability platforms.
- Solid understanding and ownership of change management processes (ITIL or similar) including change advisory boards, risk assessment, and audit‑ready documentation.
- Hands‑on experience designing and executing Disaster Recovery strategies in cloud and data center environments, including DR testing and runbook development.
- Experience overseeing desktop/end‑user support and telecom services in a growing, distributed organization.
- Proven ability to recruit, develop, and retain high‑performing technical teams with a coaching‑oriented leadership style.
- Excellent communication and stakeholder management skills—ability to translate technical complexity into business impact for executive and non‑technical audiences.
- Thrives in fast‑paced, dynamic environments with rapidly changing priorities and ambiguity.
- Strong ownership mentality: you take accountability for outcomes, drive issues to resolution, and lead by example.
Preferred Skills
- Experience in B2B SaaS, telematics, fleet management, IoT, or other real‑time, data‑intensive platforms serving enterprise customers.
- Familiarity with ITSM tools (Jira Service Management, ServiceNow), configuration management databases (CMDB), and IT asset management practices.
- Experience with observability and monitoring platforms (Datadog, New Relic, Prometheus/Grafana, Azure Monitor, Application Insights).
- Background supporting real‑time GPS tracking, vehicle telematics, or IoT device management platforms.
- Relevant certifications: Microsoft Certified: Azure Solutions Architect Expert, Azure Administrator Associate, CISSP, CISM, ITIL Foundation or higher.
- Experience scaling infrastructure to support rapid business growth (2×–3× revenue in 2–3 years).
- Prior experience operating in regulated or compliance‑driven environments (SOC 2, ISO 27001, HIPAA, FedRAMP).
- Hands‑on experience with Azure Kubernetes Service (AKS), Azure DevOps, GitHub Actions, or similar CI/CD platforms.
- Understanding of fleet management industry compliance requirements (FMCSA, ELD mandates, hours‑of‑service regulations).