Logo
Tencent

Tencent is hiring: Research Scientist – Speech and Audio Understanding (Large Mo

Tencent, Bellevue, WA, United States, 98009

Save Job

Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems)

Join to apply for the Research Scientist – Speech and Audio Understanding (Large Models & Multimodal Systems) role at Tencent

Responsibilities

  • Develop general-purpose, end-to-end large speech models covering multilingual ASR, speech translation, speech synthesis, paralinguistic understanding, and general audio understanding.
  • Advance research on speech representation learning and encoder/decoder architectures to build unified acoustic representations for multi-task and multimodal applications.
  • Explore representation alignment and fusion mechanisms between audio/speech and other modalities in large multimodal models, enabling joint modeling with image and text.
  • Build and maintain high-quality multimodal speech datasets, including automatic annotation and data synthesis technologies.

Qualifications

  • Ph.D. in Computer Science, Electrical Engineering, Artificial Intelligence, Linguistics, or a related field; or Master’s degree with several years of relevant experience.
  • Solid understanding of speech and audio signal processing, acoustic modeling, language modeling, and large model architectures.
  • Proficient in one or more core speech system development pipelines such as ASR, TTS, or speech translation; experience with multilingual, multitask, or end-to-end systems is a plus.
  • Experience in speech representation pretraining (e.g., HuBERT, Wav2Vec, Whisper) and multimodal alignment and cross‑modal modeling (audio‑visual‑text).
  • Experience driving SOTA performance on audio understanding tasks with large models.
  • Proficient in deep learning frameworks such as PyTorch or TensorFlow; experience with large‑scale training and distributed systems is a plus.
  • Familiar with Transformer‑based architectures and their applications in speech and multimodal training/inference.

Location and Compensation

Location: US-Washington-Bellevue

Expected base pay range: $122,500 – $229,700 per year.

Benefits include signing bonus, relocation package, restricted stock units, medical, dental, vision, life and disability benefits, 401(k) plan, vacation, holidays, and paid sick leave.

Equal Employment Opportunity

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

#J-18808-Ljbffr