Lead Machine Learning Engineer-MLOps Job at JPMorgan Chase & Co. in New York

JPMorgan Chase & Co., New York, NY, United States

We are looking for a Senior MLOps engineer to work closely with Data Scientists to build and deploy ML models on a modern MLOps stack.

As Lead Machine Learning Engineer on the Recommendation Engine team, you’ll build and maintain pipelines for distributed model training on large compute clusters, batch/real‑time model serving, hyperparameter tuning at scale, model monitoring, production validation and other activities vital for model development, testing and deployment in a well‑managed, controlled environment.

Our product, Personalization and Insights, builds and supports high throughput, low latency applications which leverageoge state‑of‑the‑art machine learning architectures, and which are deployed in AWS. These applications power personalized experiences across Chase Consumer & Community Banking channels, to help weave a user experience that includes traditional banking services with other services in the Travel, Merchant Offer Shopping, and Dining spaces.

Job responsibilities konpr>

Build, deploy, and maintain robust pipelines for distributed training on GPU‑enabled clusters to support المسر scalable machine learning workflows.

Develop and manage pipelines for high‑throughput, real‑time inference as well as batch inference, ensuring optimal performance and reliability.

Implement quantization techniques and deploy large language models (LLMs) to maximize efficiency and resource utilization.

Oversee the management and optimization of vector databases to support advanced AI and machine learning applications.

Establish and maintain comprehensive monitoring and observability pipelines to ensure system health, performance, and rapid issue resolution.

vdrole
Collaborate with cross‑functional teams to integrate new technologies and continuously improve existing infrastructure.

Partner withlany product, architecture, and other engineering teams to define scalable and performant technical solutions.

Required qualifications, capabilities, and skills

BS in Computer Science or related Engineering field with 6+ years of experience OR MS degree in Computer Science or related Engineering field with 4+ years experience.

Solid knowledge and extensive experience in Python.

Solid fundamentals in cloud computing, preferably AWS.

Deep knowledge and passion for data science fundamentals, training and deploying models.

Experience in monitoring and observability tools to monitor model input/output and features stats.

Operational experience in big data/ML tools such as Ray, DuckDB, Spark.

Solid grounding in engineering fundamentals and analytical mindset.

Action oriented and iterative development.

Preferred qualifications, capabilities, and skills

Experience with recommendation and personalization systems is a plus.

Solid fundamentals and experience in containers (docker ecosystem), container orchestration systems [Kubernetes, ECS], DAG orchestration [Airflow, Kubeflow etc].

Good knowledge of Databases.

#J-18808-Ljbffr

In Summary: As Lead Machine Learning Engineer on the Recommendation Engine team, you’ll build and maintain pipelines for distributed model training on large compute clusters, batch/real‑time model serving, hyperparameter tuning at scale, model monitoring, production validation and other activities .

En Español: Buscamos a un ingeniero senior de MLOps para trabajar en estrecha colaboración con Data Scientists para construir e implementar modelos ML en una pila moderna de Mlops. Como Ingeniero líder de aprendizaje automático del equipo de Recommendation Engine, usted construirá y mantendrá canalizaciones para el entrenamiento distribuido de modelos en grandes grupos informáticos, servicio de modelos por lotes/tiempo real, ajuste de hiperparámetros a escala, monitoreo de modelos, validación de producción y otras actividades vitales para el desarrollo de modelo, pruebas y implementación en un entorno bien gestionado y controlado. Estas aplicaciones impulsan experiencias personalizadas a través de los canales Chase Consumer & Community Banking, para ayudar a tejer una experiencia del usuario que incluye servicios bancarios tradicionales con otros servicios en los espacios Travel, Merchant Offer Shopping y Dining. Responsabilidades del trabajo: Construir, implementar y mantener tuberías robustas para la capacitación distribuida sobre clusters habilitados por GPU para apoyar los flujos de trabajo de aprendizaje automático escalables. Desarrollar y gestionar tuberías para un alto rendimiento, inferencia en tiempo real así como inferencia de lotes, asegurando el rendimiento óptimo y fiabilidad. Implementar técnicas de cuantización e implantar grandes modelos de lenguaje (LLM) para maximizar la eficiencia y base de recursos. Supervisar la gestión y optimización de bases de datos vectoriales para soportar las aplicaciones de aprendizajes mecánicos. Establecer y mantener amplias capacidades de monitoreo y observabilidad avanzada para garantizar el funcionamiento rápido de sistemas de datos, resolución de problemas de datos y sistemas operativos.