Logo
Tether Operations Limited

Senior Research Engineer - Multimodal & Video Foundation Model (Remote)

Tether Operations Limited, San Francisco, CA, US, 94199

Duration: Full Time

Save Job

Overview

Join Tether and Shape the Future of Digital Finance. At Tether, we’re pioneering a global financial revolution with solutions that empower businesses—from exchanges and wallets to payment processors and ATMs—to integrate reserve-backed tokens across blockchains. We enable you to store, send, and receive digital tokens securely and globally, at a fraction of the cost. Transparency is the bedrock of our work.

Innovate with Tether

Tether Finance: Our product suite features the world’s most trusted stablecoin, USDT, relied upon by hundreds of millions worldwide, alongside tokenization services.

Tether Power: Driving sustainable growth, our energy solutions optimize excess power for Bitcoin mining using eco-friendly practices in geo-diverse facilities.

Tether Data: Fueling breakthroughs in AI and peer-to-peer technology, we reduce infrastructure costs and enhance global communications with solutions like KEET, our flagship app for secure and private data sharing.

Tether Education: Democratizing access to high-quality digital learning, empowering individuals to thrive in the digital and gig economies and drive global growth.

Tether Evolution: Pushing the boundaries of technology and human potential to craft a future where innovation and human capabilities merge in powerful ways.

Why Join Us?

We are a global, remote-first team. If you’re passionate about making a mark in fintech, this is your opportunity to collaborate with leading minds, push boundaries, and set new standards. We’ve grown fast, stayed lean, and established ourselves as a leader in the industry. If you have excellent English communication skills and are ready to contribute to the most innovative platform on the planet, Tether is the place for you.

Are you ready to be part of the future?

About the job

As a member of the AI model team, you will drive architecture development for cutting-edge models across scales (small, large, and multi-modal). Your work will enhance intelligence, improve efficiency, and introduce new capabilities to advance the field.

You will have deep expertise in video generation model architectures with a hands-on, research-driven approach. Your mission is to explore and implement novel techniques and algorithms that lead to groundbreaking advancements: data curation, strengthening baselines, and resolving pre-training bottlenecks to push model performance.

Responsibilities

  • Pioneer multimodal and video-centric research, contributing to usable prototypes and scalable systems.
  • Design and implement novel AI architectures for multimodal language models, integrating text, visual, and audio modalities.
  • Engineer scalable training and inference pipelines optimized for large-scale multimodal datasets and distributed GPU systems across thousands of GPUs.
  • Optimize systems and algorithms for efficient data processing, model execution, and pipeline throughput.
  • Build modular tools for preprocessing, analyzing, and managing multimodal data assets (images, video, text).
  • Collaborate cross-functionally with research and engineering teams to translate innovations into production-grade solutions.
  • Prototype generative AI applications showcasing new capabilities of multimodal foundation models in real-world products.
  • Develop benchmarking tools to evaluate model performance across diverse multimodal tasks.
  • Bachelor’s degree in Computer Science, Computer Engineering, or a related technical field, or equivalent practical experience
  • Expertise in Python and PyTorch, with experience across the full development pipeline from data processing to training, inference, and optimization
  • Experience working with large-scale text data, or interleaved data spanning audio, video, image, and/or text (bonus)
  • Hands-on experience developing or benchmarking at least one of: LLMs, Vision-Language Models, Audio Language Models, generative video models
Nice to have skills
  • PhD in Computer Vision, Machine Learning, NLP, Computer Science, Applied Statistics, or related field
  • Expertise in computer vision, video generation foundation models and/or multimodal research
  • First-author publications at leading AI conferences (e.g., CVPR, ICCV, ECCV, ICML, ICLR, NeurIPS)

Important information for candidates

Recruitment scams are increasingly common. To protect yourself, please keep the following in mind when applying:

  • Apply only through our official channels. We do not use third-party platforms or agencies for recruitment unless clearly stated. All open roles are listed on our official careers page: https://tether.recruitee.com/
  • Verify the recruiter’s identity. All our recruiters have verified LinkedIn profiles. If unsure, confirm via their profile or through our website.
  • Be cautious of unusual communication methods. We do not conduct interviews over WhatsApp, Telegram, or SMS. All communication is through official company emails and platforms.
  • Double-check email addresses. Communications from us will come from emails ending in onet of the domains tether.to or tether.io.
  • We will never request payment or financial details. If someone asks for personal financial information or payment during the hiring process, it is a scam. Please report it immediately.

When in doubt, contact us through our official website.

#J-18808-Ljbffr