Machine Learning Engineer - Inference Job at Together AI in San Francisco

Together AI, San Francisco, CA, United States

About the Role

Together AI is seeking a Machine Learning Engineer to join ourInference Engine team, focusing on optimizing and enhancing the performance of our AI inference systems. This role involves working with state-of-the-art large language models models and ensuring they run efficiently and effectively at scale. If you are passionate about AI inference, PyTorch, and developing high-performance systems, we want to hear from you. This position offers the chance to collaborate closely with AI researchers and engineers to create cutting-edge AI solutions. Join us in shaping the future at Together AI!
Responsibilities

Design and build the production systems that power the Together AI inference engine, enabling reliability and performance at scale.
Develop and optimize runtime inference services for large-scale AI applications.
Collaborate with researchers, engineers, product managers, and designers to bring new features and research capabilities to the world.
Conduct design and code reviews to ensure high standards of quality.
Create services, tools, and developer documentation to support the inference engine.
Implement robust and fault-tolerant systems for data ingestion and processing.

Requirements

3+ years of experience writing high-performance, well-tested, production-quality code.
Proficiency with Python and PyTorch.
Demonstrated experience in building high performance libraries and tooling.
Excellent understanding of low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale.
Preferred: Knowledge of existing AI inference systems such as TGI, vLLM, TensorRT-LLM, Optimum
Preferred: Knowledge of AI inference techniques such as speculative decoding.
Preferred: Knowledge of CUDA/Triton programming.
Nice to have: Knowledge of Rust, Cython and compilers.

About Together AI

Together AI is a research-driven artificial intelligence company. We believe open and transparent AI systems will drive innovation and create the best outcomes for society. Together, we are on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models. We have contributed to leading open-source research, models, and datasets to advance the frontier of AI. Our team has been behind technological advancements such as FlashAttention, Hyena, FlexGen, and RedPajama. We invite you to join a passionate group of researchers and engineers in our journey to build the next-generation AI infrastructure.
Compensation

We offer competitive compensation, startup equity, health insurance, and other competitive benefits. The US base salary range for this full-time position is $160,000 - $230,000 + equity + benefits. Our salary ranges are determined by location, level, and role. Individual compensation will be determined by experience, skills, and job-related knowledge.
Equal Opportunity

Together AI is an Equal Opportunity Employer and is proud to offer equal employment opportunities to everyone regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity, veteran status, and more.

Please see our privacy policy at https://www.together.ai/privacy

In Summary: Together AI is seeking a Machine Learning Engineer to join ourInference Engine team . The role involves working with state-of-the-art large language models models and ensuring they run efficiently and effectively at scale . The US base salary range for this full-time position is $160,000 - $230,000 + equity + benefits .

En Español: Sobre el papel

Juntos, AI está buscando un ingeniero de aprendizaje automático para unirse a nuestro equipo de Inference Engine, centrándose en optimizar y mejorar el rendimiento de nuestros sistemas de inferencia de IA. Esta función implica trabajar con modelos de idiomas grandes de última generación y garantizar que funcionen eficientemente y efectivamente a escala. Si usted es apasionado por la inferencia artificial, PyTorch y el desarrollo de sistemas de alto rendimiento, queremos escucharlo.
Responsabilidades

Diseñar y construir los sistemas de producción que impulsan el motor de inferencia AI Together, permitiendo la confiabilidad y rendimiento a escala.
Desarrollar y optimizar los servicios de inferencia en tiempo de ejecución para aplicaciones de IA a gran escala.
Colaborar con investigadores, ingenieros, gerentes de productos y diseñadores para traer nuevas características y capacidades de investigación al mundo.
Realizar revisiones de diseño y código para garantizar altos estándares de calidad.
Crear servicios, herramientas y documentación para desarrolladores para apoyar el motor de inferencia.
Implementar sistemas robustos y tolerantes a los fallos para la ingestión y el procesamiento de datos.

Requisitos

3+ años de experiencia en la escritura de códigos de alto rendimiento, bien probados y de calidad productiva.
Proficiencia con Python y PyTorch.
Experiencia demostrada en la construcción de bibliotecas y herramientas de alto rendimiento.
Excelente comprensión de los conceptos de sistemas operativos de bajo nivel, incluyendo el multi-threading, la gestión de memoria, las redes, almacenamiento, rendimiento y escala.
Preferido: Conocimiento de los sistemas existentes de inferencia AI como TGI, vLLM, TensorRT-LLM y Optimum
Preferido: Conocimiento de las técnicas de inferencia AI como la descodificación especulativa.
Preferente: Conocimiento de la programación CUDA/Triton.
Me gusta tener conocimiento de Rust, Cython y compiladores.

Sobre la inteligencia artificial

Juntos AI es una empresa de inteligencia artificial impulsada por la investigación. Creemos que los sistemas abiertos y transparentes de IA impulsarán la innovación y crearán los mejores resultados para la sociedad. Juntos, estamos en una misión de reducir significativamente el costo de los sistemas modernos de IA mediante el co-diseño de software, hardware, algoritmos y modelos. Hemos contribuido a liderar investigaciones, modelos y conjuntos de datos de código abierto para avanzar en la frontera de la IA. Nuestro equipo ha estado detrás de avances tecnológicos como FlashAttention, Hyena, FlexGen e RedPajama. Le invitamos a unirse a un grupo apasionado de investigadores e ingenieros en nuestro viaje para construir la infraestructura de IA de próxima generación.
Compensación

Ofrecemos compensación competitiva, capital de inicio, seguro médico y otros beneficios competitivos. El rango salarial básico en Estados Unidos para este puesto a tiempo completo es de $ 160,000 - $ 230,000 + equidad + beneficios. Nuestros rangos salariales se determinan por ubicación, nivel y rol. La compensación individual será determinada por experiencia, habilidades y conocimientos relacionados con el trabajo.
Igualdad de oportunidades

Juntos, AI es un Empleador de Igualdad de Oportunidades y se enorgullece de ofrecer oportunidades de empleo iguales a todos independientemente de su raza, color, ascendencia, religión, sexo, origen nacional, orientación sexual, edad, ciudadanía, estado civil, discapacidad, identidad de género, estatus veterano y más.

Consulte nuestra política de privacidad en https://www.together.ai/privacy