Audio/Multimodal AI PhD Intern

Reality Defender · New York, NY, USA · 1 days ago

Job type:: Full Time

The Multimodal Ai Internship

The 4-month internship is designed for current PhD students and candidates to partner with Reality Defender's AI team to conduct cutting-edge research and publish peer-reviewed papers. Your primary collaborators will be Surya Koppisetti and Yi Zhu, who will guide and advise your efforts within multi-modal deepfake detection. This internship can be performed remotely, although you're welcome to work from our HQ in New York City.
What you'll do:
Investigate and propose new methods for detecting generative multi-modal content, spanning audio and vision modalities.
Perform research on multi-modal deepfake detection and reasoning tasks.
Collaborate with researchers in the team.
Write up results of research for internal reports and submission to academic journals/workshops.
Independently implement and evaluate ideas on modern deep learning stack - Python, PyTorch, and GPU-enabled cloud compute, like AWS/GCP.
Who you are:
PhD student in a relevant technical field, preferably three or more years into the program
Experience in multi-modal learning, such as in audio-visual classification and audio-language reasoning.
Proficient in Python and in building deep learning models with PyTorch.
Published peer-reviewed research papers in reputable AI and speech venues, e.g. CVPR, NeurIPS, ACL, Interspeech.
Excited about Reality Defender's mission to build a best-in-class and comprehensive deepfake and AI-generated content detection platform.
Available to start May 1st, 2026, for a minimum duration of 4 months.