Director of Platform Engineering (Pleasanton)

MeeruAI, Pleasanton, CA, United States

Title: Director of Platform Engineering Location:

Remote (US Preferred) Reports to:

Head of Engineering Team Size:

5–10 initially, scaling to 15+ Critical Hire Timeline:

Week 2–3 (platform foundation required for Maverick launch) Position Overview We are seeking an exceptional

Director of Platform Engineering

to serve as a strategic partner to the Head of Engineering and drive

technical excellence, operational efficiency, and business impact

across the entire engineering organization. This is a

dual-mandate leadership role

that combines: Platform & Infrastructure ownership Engineering Operations leadership Strategic business partnership Organizational excellence and execution rigor This role

goes far beyond traditional DevOps or Platform leadership . You will be the

right hand to the Head of Engineering , owning day-to-day operational excellence while enabling product velocity, AI scalability, and enterprise-grade reliability. Role Scope & Accountability Area Ownership

Platform & Infrastructure 40% Engineering Operations 30% Strategic Business Partnership 20% Organizational Excellence 10% Leadership & Collaboration Expectations (Non-Negotiable) Resolve disagreements

privately , present

aligned positions publicly No unaligned executive escalations Transparent risk communication with aligned mitigation plans Platform team must be viewed as an

enabler , not a blocker Influence through

trust, data, and partnership , not authority Key Responsibilities I. Platform & Infrastructure Leadership (40%) Cloud, Architecture & Scalability Own AWS infrastructure strategy (EKS, RDS, VPC, IAM, networking) Define multi-tenant SaaS patterns (shared DB + RLS, silo for enterprise) Scale platform from

10 → 100 → 500+ customers Drive vendor evaluation and build vs. buy decisions Ensure reliability, performance, security, and cost efficiency AI / ML Infrastructure & MLOps (Critical) Self-Hosted LLM Infrastructure Deploy and operate

self-hosted SLMs

for privacy and cost efficiency GPU infrastructure (AWS P4/G5, 8–16 GPUs) Model serving: vLLM, TGI, Ray Serve Fine-tuning pipelines (LoRA, QLoRA) Quantization (4-bit / 8-bit) and autoscaling Model Deployment & APIs Deploy predictive ML models (forecasting, classification, anomaly detection) Real-time inference ( CI/CD for models with canary and blue-green deployments Drift detection, accuracy tracking, rollback AI Cost Management & Pricing Enablement Token and GPU cost tracking per tenant and feature Unit economics for AI workloads API vs self-hosted break-even modeling Prompt caching, response caching, batching strategies (40–60% savings) AI Observability & SLOs LLM latency (p50/p95/p99), success rates, token usage Agent performance (completion rate, tool success, latency) RAG quality metrics and retrieval accuracy Cost anomaly detection and alerting DevOps, Reliability & SRE Build and scale DevOps/SRE team (3–4 → 8–10) CI/CD with

Define SLAs/SLOs (99.9% uptime target) Incident response, blameless postmortems, MTTR/MTTD tracking Disaster recovery and business continuity planning Security & Compliance Own SOC 2 Type I & II, GDPR, HIPAA readiness Zero Trust security architecture Vulnerability management and pen testing Security team hiring and security champions program Incident response and forensics Cloud Cost Optimization (FinOps) Own AWS + AI budget ($50K → $500K+/month) Reserved instances, spot strategies, right-sizing Cost allocation by tenant and team Target:

20% YoY cost reduction II. Engineering Operations Leadership (30%) Talent & People Operations Hiring strategy for

33–44 engineers in Year 1 Build offshore development centers (India / Eastern Europe) Own performance reviews, promotions, PIPs, exits Define career ladders, leveling, and compensation bands Coach managers and directors Engineering Productivity & Tools Own dev tooling: GitHub, CI/CD, Jira/Linear, Notion, Datadog Track DORA metrics, cycle time, developer NPS Reduce friction via automation and internal platforms Vendor management and SaaS consolidation Process & Execution Excellence Agile ceremonies, RFCs, architecture reviews Release management and dependency coordination Executive dashboards and KPI reporting Conflict resolution via private alignment and consensus III. Strategic Business Partner (20%) Platform & AI Pricing Strategy Define SaaS + AI pricing tiers (Starter / Pro / Enterprise) Usage-based AI pricing (queries, tokens, agents) Gross margin modeling (>70% infra, >60% AI) Cost-to-serve and break-even analysis Financial Planning & Advisory Engineering budget ownership ($4.7M–$6M) Headcount and infrastructure forecasting ROI analysis for infrastructure investments Vendor negotiation (AWS, Datadog, Auth0, LLM providers) Strategic Leadership Identify blindspots proactively Quarterly and annual planning partner to Head of Engineering Support Sales on enterprise deals and security reviews Board and investor-facing technical leadership IV. Organizational Excellence (10%) Define and reinforce engineering culture and values Knowledge management, documentation, onboarding playbooks Executive communication and board-level reporting High-trust, high-performance environment Required Qualifications Technical 12+ years engineering experience, 6+ years leadership AWS at scale (EKS, RDS, VPC, IAM) Kubernetes, Terraform, CI/CD, DevSecOps Required:

Self-hosted LLMs, GPU infra, MLOps in production AI cost optimization and observability experience Security and compliance leadership (SOC 2, GDPR, HIPAA) Operational & Business Led 30–50+ person engineering orgs Hiring, performance management, and org design $5M+ engineering budgets SaaS unit economics and pricing strategy Executive-level communication and diplomacy Preferred Qualifications VP Engineering experience at Series A/B/C startup Large-scale AI/GPU deployments (100+ GPUs) Fintech or regulated domain experience Offshore center build-out experience MBA or executive leadership training