Deepstreamtech is seeking an experienced software engineer to optimize and scale our ML training systems. You will manage critical infrastructure for large-scale training, working closely with researchers to translate ideas into production training runs. The role involves hands-on experience with JAX and cloud platforms such as SLURM and Kubernetes. This position is key to ensuring reliable, reproducible, and efficient large-scale model training, contributing directly to core ML efforts.
#J-18808-Ljbffr

ML Infrastructure Engineer: Scale JAX/TPU Training
Deepstreamtech · San Francisco, CA, USA ·
- Pay:
- 125.000
- Job type:
- Full Time