Logo
Meta Platforms

Research Scientist Intern, Multimodal Audio Generation (PhD)

Meta Platforms, Burlingame, CA, US, 94012

Duration: Full Time

Save Job

Research Scientist Intern, Multimodal Audio Generation

Meta was built to help people connect and share, and over the last decade our tools have played a critical part in changing how people around the world communicate with one another. With over a billion people using the service and more than fifty offices around the globe, a career at Meta offers countless ways to make an impact in a fast growing organization. Meta's Core AI team is seeking a Research Scientist Intern with a focus on audio generation, especially music and song generation from multimodal input. Our team is pioneering AI research across text, audio, and video domains, with a mission to develop AI-driven foundational models and their applications. We are committed to advancing state-of-the-art algorithms, promoting open research, and fostering scientific innovation in all aspects of AI for language, including language modeling, natural language understanding and generation, audiovisual learning, on-device/personalized LM, and multimodal applications. As a Research Scientist Intern, you will play a crucial role in developing cutting-edge models and algorithms in AI Research. We are seeking a candidate with expertise in multimodal learning and audio generation. The ideal candidate will have a strong background in deep learning and general machine learning, coupled with a deep passion for computer vision and audio processing. In this position, you will work with the domain experts to understand the challenges and build state-of-the-art models to tackle them. Our internships are twelve (12) to twenty-four (24) weeks long and we have various start dates throughout the year.

Responsibilities
  • Lead and contribute to cutting-edge audio (music and song) generation model research that leads to publications on top-tier conferences
  • Perform research to tackle unsolved real-world problems and push the state of the art
  • Independently design and implement algorithms, train advanced foundational models on large datasets, and evaluate their performance
  • Define, plan and execute cutting-edge deep learning research to advance product experiences using the audio generation features
  • Communicate the experimental results and the recommendations clearly, both within the group as well as to the cross-functional groups
Minimum Qualifications
  • Currently is in the process of obtaining a PhD in the field of Artificial Intelligence or related field
  • Research experience in one or more of these areas: machine learning, deep learning, generative AI, audio processing or related fields
  • Knowledge of state of the art deep learning methods and neural networks
  • Experience working with machine learning libraries like Pytorch, Jax, etc
  • Experience with scripting languages such as Python and shell scripts
  • Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment
Preferred Qualifications
  • Intent to return to degree-program after the completion of the internship
  • Experience with developing scalable machine learning models in at least one of the following areas: large language models, natural language understanding or generation, efficient training and inference, multimodals, or relevant areas
  • Experience with large scale model training, implementing algorithms, and evaluating language systems
  • Proven track record of achieving significant results as demonstrated by publications at leading conferences/journals such as NeurIPS, ICLR, ICML, CVPR, ICCV, ICASSP, Interspeech, AAAI, IEEE TASLP or similar
  • Experience working and communicating cross functionally in a team environment
  • Experience solving complex problems and comparing alternative solutions, trade offs, and diverse points of view to determine a path forward