Machine Learning Engineer, Data Job at Hedra, Inc in San Francisco

Hedra, Inc, San Francisco, CA, United States

About Hedra Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures. We're building Hedra Studio, a multimodal creation platform capable of control, emotion, and creative intelligence.
At the core of Hedra Studio is our Character‑3 foundation model, the first omnimodal model in production. Character‑3 jointly reasons across image, text, and audio for more intelligent video generation — it’s the next evolution of AI-driven content creation.
At Hedra, we’re a team of hard‑working, passionate individuals seeking to fundamentally change content creation and build a generational company together. We value startup energy, initiative, and the ability to turn bold ideas into real products. Our team is fully in‑person in SF/NY with a shared love for whiteboard problem‑solving.
Overview We are looking for an ML Engineer with 3+ YOE designing, building, and maintaining data pipelines at scale. The ideal candidate has diverse experience managing data from ingest and processing through storage and training. This role is vital for ensuring the computational backbone supports the company’s ML efforts, focusing on deployment and scalability.
Responsibilities Lead the efforts to design, implement, and maintain scalable solutions for data warehousing and processing. Capable of providing the right solutions for the evolving needs of our research teams.
Manage and optimize the performance of our computing clusters or cloud instances, such as AWS or Google Cloud, to support distributed data processing at scale.
Design data snapshots, ETL pipelines, and storage solutions with a strong focus on data shape and layout to ensure the flexibility required for training.
Collaborate across research teams to understand their data needs and provide appropriate solutions, facilitating seamless model training.
Qualifications Bachelor’s degree in Computer Science, Information Technology, or a related field.
Experience with cloud computing platforms such as Amazon Web Services, Google Cloud, or Microsoft Azure, essential for managing large-scale ML workloads.
Understands the importance of orchestration tools in ML data workflows, and values engineering processes and version control (CI/CD).
Experience designing, building, and managing large-scale data pipelines for ML; experience with video data is a huge plus.
Understanding of distributed training techniques and how to scale models across multi‑node clusters aligning with video generation needs.
Strong problem‑solving and communication skills, given the need to collaborate with diverse teams.
Benefits Competitive compensation + equity
401k (no match)
Healthcare (Silver PPO Medical, Vision, Dental)
Lunch and snacks at the office
We encourage you to apply even if you don't meet every requirement — we value curiosity, creativity, and the drive to solve hard problems.

#J-18808-Ljbffr

In Summary: Hedra is a pioneering generative media company backed by top investors at Index, A16Z, and Abstract Ventures . The ideal candidate has diverse experience managing data from ingest and processing through storage and training . This role is vital for ensuring the computational backbone supports the company’s ML efforts, focusing on deployment and scalability .

En Español: Hedra es una compañía de medios generativos pionera respaldada por los principales inversores de Index, A16Z y Abstract Ventures. Estamos construyendo Hedra Studio, una plataforma de creación multimodal capaz de controlar, emocionar e inteligencia creativa. En el núcleo de Hedra Studios está nuestro modelo fundacional Character‐3, el primer modelo omnimodal en producción. Capaz de proporcionar las soluciones adecuadas para las necesidades en evolución de nuestros equipos de investigación. Gestionar y optimizar el rendimiento de nuestros clusters informáticos o instancias de nube, como AWS o Google Cloud, para apoyar el procesamiento distribuido de datos a escala. Diseñar instantáneas de datos, tuberías ETL y soluciones de almacenamiento con un fuerte enfoque en la forma y diseño de los datos para garantizar la flexibilidad requerida para la capacitación. Colaborar entre los grupos de investigación para comprender sus necesidades de datos y ofrecer soluciones apropiadas, facilitando una formación perfecta del modelo. Calificaciones Licenciados en Ciencias Informáticas, Tecnología de la Información o un campo relacionado. Experiencia con plataformas de computación en la nube como Amazon Web Services, Google Cloud o Microsoft Azure, esencial para gestionar cargas de trabajo a gran escala. Comprende la importancia de herramientas de orquestación en los flujos de trabajo de ML Healthcare, así como los procesos de ingeniería de conocimientos básicos (no-construcción/control de datos).