Logo
Propio

Audio AI Engineer Job at Propio in Overland Park

Propio, Overland Park, KS, United States, 66213

Save Job

Description

Propio is on a mission to make communication accessible to everyone. As a leader in real-time interpretation and multilingual language services, we connect people with the information they need across language, culture, and modality. We’re committed to building AI-powered tools to enhance interpreter workflows, automate multilingual insights, and scale communication quality across industries.

We are hiring an Audio AI Engineer that will develop and optimize end-to-end systems that enable real-time, high-fidelity speech-to-speech interpretation at Propio. This role focuses on seamlessly connecting speech recognition, translation, and synthesis technologies to create natural, low-latency interpretation experiences.

Key Responsibilities

  • Design and optimize end-to-end Speech-to-Speech pipelines that integrate ASR, translation, and TTS with minimal latency
  • Build bidirectional interpretation systems that handle turn-taking, speaker identification, and context preservation across language boundaries
  • Collaborate with the Audio/Speech Engineer to optimize latency, quality, and robustness of speech components in the full pipeline
  • Work with the Staff ML Engineer to design efficient inference architectures and deployment strategies for real-time streaming systems
  • Develop streaming ASR and TTS systems capable of handling continuous, overlapping speech in interpretation scenarios
  • Benchmark and optimize latency across all pipeline stages (speech capture, recognition, translation, synthesis)
  • Integrate speaker diarization, acoustic environment adaptation, and speech enhancement into interpretation workflows
  • Partner with linguists and product teams to validate interpretation quality and gather domain-specific feedback

Qualifications

  • Bachelor's or Master’s Degree in Electrical Engineering, Computer Science, or related field
  • 3+ years of experience in speech processing, audio engineering, or conversational AI systems
  • Deep expertise in ASR, TTS, and streaming audio architectures
  • Proficiency in Python, ML frameworks, and experience with real-time signal processing
  • Experience building low-latency production systems and optimizing for inference performance
  • Strong understanding of interpretation workflows, multilingual challenges, and speech quality metrics

Preferred Qualifications

  • Experience building speech-to-text pipelines or hybrid ASR + LLM systems
  • Familiarity with real-time audio processing or latency-sensitive applications
#J-18808-Ljbffr