Applied ML Engineer, Speech Job at Speak in San Francisco

Speak, San Francisco, CA, United States

About us Our mission is to reinvent the way people learn, starting with language. We begin by teaching the next billion people English, Spanish, and French.
English is the global language of business, culture, and communication, and over 1.5 billion people around the world are actively trying to learn right now. Others dream of communicating with the half-billion native Spanish speakers across the globe. The problem is that it's nearly impossible to learn to speak a language without constant access to a speaking partner. Grammar and vocab apps don't really help – you need to actually converse with someone.
Speak is on a journey to fix this. We're creating an AI-powered experience that replicates the flow of a conversation, without needing a human on the other end. The goal is to make it radically more accessible to be able to have conversations in a foreign language and eventually help hundreds of millions of people gain fluency who otherwise wouldn't be able to.
We started on this journey over five years ago and we've still got a long ways to go. We're thoughtfully adding new team members only when we think they can truly play a big role in our mission.
Speak launched first in South Korea where we have quickly grown to become the top grossing education app in the country. We have now delivered this winning product to more than 40 countries globally and are continuing to expand to more markets in the coming months. The company is well funded, and as of December 2024, we've reached a $1B valuation with our Series C round, through key partners like Accel, OpenAI, Founders Fund, Y Combinator, Khosla Ventures, Lachy Groom, Josh Buckley, and more. We’re a team of more than 90 based throughout San Francisco, Seoul, Tokyo, Taipei, and Ljubljana.
About this role We are looking for an experienced Machine Learning Engineer to join our team and help develop cutting-edge speech recognition models that help teach language fluency. In this role you will take ownership of the end-to-end modeling pipeline, from training and experimentation to deployment and monitoring. You will also work closely with Product teams to design innovative learning experiences and measure the efficacy of production models as they affect our end users. We are a small, dynamic team where you will contribute as a developer and thought partner on team projects like ASR, assessment, pronunciation, content personalization, and much more. This is an incredibly exciting time to join an ML team designing a personalized learning experience that will revolutionize language learning for millions of learners worldwide — come join us!
What you'll be doing Training and deploying ASR models end-to-end, including monitoring, performance tracking, and retraining

Improving the pronunciation model that provides precise feedback, and make it more central to our learning app

Creating metrics to measure ASR performance across tasks and languages

Expanding our ASR systems to new languages and markets

Building and maintaining data infrastructure such as training/evaluation datasets and labeling pipelines

What we're looking for Extensive experience training large models on GPUs and deploying custom deep learning models

Proficiency in Python and common Deep Learning frameworks like PyTorch

Demonstrated experience owning ML pipelines end to end, from POC to production

Strong communication skills and the ability to explain complex ML concepts to non-technical stakeholders

Sharp product sense and an ability to think broadly and cross-functionally about model quality in the context of user experience

Bonus
Experience with speech or audio

Office San Francisco, CA

Why work at Speak Join a fantastic, tight-knit team at the right time: we're growing very quickly, we've most recently raised our Series C from some of the top investors in the valley, and we've achieved product-market fit in our initial markets. You'd join at a magical time when a single person could significantly change the course of the company.

Do your life's work with people you’ll love working with: we care strongly about our craft and want every person at Speak to feel like they're growing every day. We believe in the idea that working with people you both enjoy and have respect for makes everything better. We hire thoughtfully and only work with people we admire deeply.

Global in nature: We're live in over 40 countries and launching in a number of new markets soon. We have dedicated offices in San Francisco, Ljubljana, Seoul, and Tokyo, and you’ll have the opportunity to talk to users in each of these regions on a regular basis as well as travel.

Impact people's lives in a major way: Learning a language is one of the single most life-changing skills one can learn, and right now 99% of people never achieve their goal because the process is broken. We’re helping millions of people achieve their goals and improve their lives.

Speak does not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.

#J-18808-Ljbffr

In Summary: We're looking for an experienced Machine Learning Engineer to join our team and help develop cutting-edge speech recognition models that help teach language fluency . The company is well funded, and as of December 2024, we've reached a $1B valuation with our Series C round .

En Español: Nuestra misión es reinventar la forma en que las personas aprenden, empezando por el idioma. Comenzamos enseñando a los próximos mil millones de personas inglés, español y francés. Inglés es el lenguaje global de negocios, cultura y comunicación, y más de 1.500 millones de gente alrededor del mundo están tratando activamente de aprender ahora mismo. Otros sueñan con comunicarse con medio billón de hablantes nativos de español en todo el mundo. El problema es que es casi imposible aprender a hablar un idioma sin acceso constante a una pareja hablando. Aplicaciones de gramática y vocabulario no ayudan realmente necesitas conversar realmente con alguien. Speak está en camino para expandir esto. Estamos creando una experiencia impulsada por inteligencia artificial que replica el flujo de una conversación, sin necesidad de un humano al otro extremo. La compañía está bien financiada, y a partir de diciembre de 2024, hemos alcanzado una valoración de $ 1 mil millones con nuestra ronda Serie C, a través de socios clave como Accel, OpenAI, Founders Fund, Y Combinator, Khosla Ventures, Lachy Groom, Josh Buckley, etc. Somos un equipo de más de 90 personas basado en San Francisco, Seúl, Tokio, Taipei y Lubliana. Sobre este papel buscamos un ingeniero de aprendizaje automático experimentado para unirse a nuestro equipo y ayudar a los modelos de reconocimiento del habla de vanguardia que ayudan a enseñar la fluidez lingüística. En esta función usted tomará posesión del pipeline de modelado de extremo a extremo, desde el entrenamiento y la experimentación hasta la implementación y monitorización. También trabajará estrechamente con equipos de aprendizajes innovadores para diseñar experiencias de producto y la eficacia de modelos de producción mientras desarrollan a nuestros usuarios finales. Este es un momento increíblemente emocionante para unirse a un equipo de ML que diseña una experiencia de aprendizaje personalizada que revolucionará el aprendizaje de idiomas para millones de aprendices en todo el mundo venid a unirnos! Lo que haréis Entrenamiento e implementación de modelos ASR de extremo a extremo, incluyendo monitoreo, seguimiento del rendimiento y reentrenamiento Mejorar el modelo de pronunciación que proporciona retroalimentación precisa, y hacerlo más central para nuestra aplicación de aprendizajes Creando métricas para medir el rendimiento ASR entre tareas y lenguajes Expandiendo nuestros sistemas ASR a nuevos idiomas y mercados Construyendo y manteniendo infraestructura de datos como conjuntos de datos de capacitación / evaluación y etiquetado adecuados Qué estamos buscando Experiencia extensa entrenando grandes modelos sobre GPUs y implementando modelos de Deep Learning Proficiencia en Python y marcos comunes de Aprendizaje profundo Como PyTorchnitch compartimos recientemente las habilidades adquiridas por los usuarios finales del mercado de trabajo del producto POC con la capacidad de trabajar de manera compleja. Usted se unirá en un momento mágico cuando una sola persona podría cambiar significativamente el curso de la empresa. Haga su trabajo de vida con personas que le encantará trabajar: nos preocupamos mucho por nuestro oficio y queremos que cada persona en Speak sienta que está creciendo todos los días. Creemos en la idea de que trabajar con personas a las que disfrutas y respetas mejora todo. Contratamos cuidadosamente y solo trabajamos con gente que admiramos profundamente.