Meta Platforms

Research Scientist Intern, Multimodal Audio Generation (PhD)

Meta Platforms, Burlingame, CA, US, 94012

Duration: Full Time

Research Scientist Intern, Multimodal Audio Generation

Meta was built to help people connect and share, and over the last decade our tools have played a critical part in changing how people around the world communicate with one another. With over a billion people using the service and more than fifty offices around the globe, a career at Meta offers countless ways to make an impact in a fast growing organization. Meta's Core AI team is seeking a Research Scientist Intern with a focus on audio generation, especially music and song generation from multimodal input. Our team is pioneering AI research across text, audio, and video domains, with a mission to develop AI-driven foundational models and their applications. We are committed to advancing state-of-the-art algorithms, promoting open research, and fostering scientific innovation in all aspects of AI for language, including language modeling, natural language understanding and generation, audiovisual learning, on-device/personalized LM, and multimodal applications. As a Research Scientist Intern, you will play a crucial role in developing cutting-edge models and algorithms in AI Research. We are seeking a candidate with expertise in multimodal learning and audio generation. The ideal candidate will have a strong background in deep learning and general machine learning, coupled with a deep passion for computer vision and audio processing. In this position, you will work with the domain experts to understand the challenges and build state-of-the-art models to tackle them. Our internships are twelve (12) to twenty-four (24) weeks long and we have various start dates throughout the year.

Responsibilities

Lead and contribute to cutting-edge audio (music and song) generation model research that leads to publications on top-tier conferences
Perform research to tackle unsolved real-world problems and push the state of the art
Independently design and implement algorithms, train advanced foundational models on large datasets, and evaluate their performance
Define, plan and execute cutting-edge deep learning research to advance product experiences using the audio generation features
Communicate the experimental results and the recommendations clearly, both within the group as well as to the cross-functional groups

Minimum Qualifications

Currently is in the process of obtaining a PhD in the field of Artificial Intelligence or related field
Research experience in one or more of these areas: machine learning, deep learning, generative AI, audio processing or related fields
Knowledge of state of the art deep learning methods and neural networks
Experience working with machine learning libraries like Pytorch, Jax, etc
Experience with scripting languages such as Python and shell scripts
Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment

Preferred Qualifications

Intent to return to degree-program after the completion of the internship
Experience with developing scalable machine learning models in at least one of the following areas: large language models, natural language understanding or generation, efficient training and inference, multimodals, or relevant areas
Experience with large scale model training, implementing algorithms, and evaluating language systems
Proven track record of achieving significant results as demonstrated by publications at leading conferences/journals such as NeurIPS, ICLR, ICML, CVPR, ICCV, ICASSP, Interspeech, AAAI, IEEE TASLP or similar
Experience working and communicating cross functionally in a team environment
Experience solving complex problems and comparing alternative solutions, trade offs, and diverse points of view to determine a path forward