Principal Machine Learning Engineer, Mobile AI Inference Optimization Job at Uni

Unity, mountain view, ca, United States

The opportunity

We are building the next generation of mobile game AI experiences, deploying world models to mobile on-device. As our Principal Machine Learning Engineer, you will be the foremost technical authority on bringing state‑of‑the‑art multi‑modal models (transformers, diffusion networks, and JAPE‑style architectures) from research to production on mobile hardware.

This is a deeply hands‑on, high‑impact role. You will define the inference strategy, drive architectural decisions across the full mobile ML stack, and mentor a team of senior and mid‑level engineers. Your work will directly determine the latency, quality, and power profile of AI‑driven features experienced by billions of mobile game players.

What you’ll be doing

Technical Leadership: Set the technical vision and roadmap for deploying multi‑modal AI models to iOS and Android, spanning transformers, diffusion models, and JAPE‑style generative architectures.
Make authoritative decisions on model compression, quantization, pruning, and knowledge distillation strategies to meet mobile latency and memory budgets.
Evaluate and select inference runtimes (e.g., CoreML, ONNX Runtime Mobile, TFLite, ExecuTorch) and drive adoption across the team.
Own the end‑to‑end optimization pipeline: from model export and graph transformation to hardware‑specific kernel tuning on NPU, GPU, and CPU.
Architecture & Research Translation: Collaborate directly with research scientists to translate novel model architectures into deployable, mobile‑optimized implementations.
Design scalable systems for multi‑modal inference that process diverse inputs — images, text, primitives, and metadata — and produce pixel‑level outputs with real‑time performance.
Pioneer new approaches to dynamic resolution, token reduction, and speculative decoding tailored to mobile constraints.
Track and rapidly adopt breakthroughs in efficient diffusion (e.g., consistency models, flow matching) and efficient attention (e.g., FlashAttention, linear attention variants).
Team & Cross‑Functional Leadership: Lead and mentor a team of ML engineers; define engineering best practices, code review standards, and on‑device benchmarking methodology.
Partner with platform engineers, product managers, and runtime teams to align ML capabilities with device SKU constraints and product roadmaps.
Champion a culture of measurement: define KPIs for latency, accuracy, memory, and power consumption and ensure the team tracks them rigorously.

What we’re looking for

8+ years in ML engineering, with at least 3 years focused on on‑device / edge inference optimization.
Proven production deployment of transformer‑based models (e.g., ViT, LLaMA, Stable Diffusion) and/or JAPE‑style generative architectures on mobile or embedded hardware.
Hands‑on expertise with CoreML, TFLite, ONNX Runtime, and/or ExecuTorch; deep understanding of operator fusion, memory layout, and runtime scheduling.
Expert‑level command of INT8/INT4/FP16 quantization, weight sharing, structured/unstructured pruning, and knowledge distillation.
Strong understanding of mobile SoC architectures (Apple Neural Engine, Qualcomm Hexagon/Adreno, ARM Mali) and how to target each for peak throughput.
Proficiency in C++ / Objective‑C / Swift for runtime integration; solid Python for training‑side tooling and export pipelines.
Ability to read, implement, and extend ML research papers; familiarity with efficient attention, diffusion samplers, and multi‑modal fusion techniques.
Track record of technical leadership: setting direction, influencing cross‑functional partners, and growing engineers.

You might also have

Experience shipping world‑model or neural rendering pipelines (NeRF, 3DGS, or similar) on mobile.
Contributions to open‑source ML inference frameworks or mobile ML research publications.
Familiarity with compiler stacks such as MLIR, TVM, or XLA for custom kernel generation.
Background in real‑time graphics or game engine pipelines (Metal, Vulkan, OpenGL ES).

Additional information

International relocation support is not available for this position.

Benefits

Comprehensive health, life, and disability insurance
Commute subsidy
Employee stock ownership
Competitive retirement/pension plans
Generous vacation and personal days
Support for new parents through leave and family‑care programs
Office food snacks
Mental Health and Wellbeing programs and support
Employee Resource Groups
Global Employee Assistance Program
Training and development programs
Volunteering and donation matching program

Language requirement

This position requires the incumbent to have sufficient knowledge of English to have professional verbal and written exchanges in this language, as performance of duties requires frequent and regular communication with colleagues and partners worldwide whose common language is English.

EEO statement

Unity is a proud equal opportunity employer. We are committed to fostering an inclusive, innovative environment and celebrate our employees across age, race, color, ancestry, national origin, religion, disability, sex, gender identity or expression, sexual orientation, or any other protected status in accordance with applicable law. If you have a disability that means there are preparations or accommodations we can make to ensure you have a comfortable and positive interview experience, please fill out this form to let us know.

Salary

Gross pay salary

$278,100 — $347,600 USD

#J-18808-Ljbffr