Mediabistro logo
job logo

Sr. Network Engineer/Rack Solution Job at Supermicro in San Jose

Supermicro, San Jose, CA, United States


Job Req ID: 27692

About Supermicro:

Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us.

Job Summary:

As a Sr. Network Engineer, you'll be the go-to person to roll out and maintain business critical applications and services for Supermicro. You are also responsible for resolving escalated service issues, coaching other engineers to resolutions, engineering and implementing complex projects. You will be a person who is independent with leadership to drive the technical development and with excellent communication skills.

Essential Duties and Responsibilities:

Includes the following essential duties and responsibilities (other duties may also be assigned):
* Execute comprehensive system-level rack tests on latest NVidia and AMD GPUs, ARM-based, Intel Xeon, and AMD EPYC processors, encompassing functionality, compatibility, performance, stress, and reliability testing, leveraging proprietary in-house tools.
* Establish expertise in HPC/AI applications and benchmarks, delivering impactful training sessions to customers and partners, while addressing complex customer support issues, demonstrating innovative problem-solving skills and building robust processes and procedures for HPC/AI solutions.
* Conduct proof of concept design and testing, providing optimized benchmarks for HPC/AI applications in a timely manner. Fine-tune BIOS settings, optimize OS
etwork configurations, and develop diverse simulation configurations to enhance efficiency across various workloads.
* Deliver on-site deployment services, ensuring customer acceptance verification and providing post-level 1&2 support. Create and maintain technical documentation, including technical notes, blogs, and diagrams, to facilitate knowledge dissemination.
* Identify and document hardware and software quality issues and collaborate with Product Management and other Engineering teams to integrate customer feedback into future product enhancements.
* Proactively engage in HPC roadmap development, planning software and hardware upgrades to sustain exceptional HPC infrastructure performance.
* Document and analyze test plans, reports, logs, and actively contribute to the development of test utilities and automation scripts to streamline testing processes.

Qualifications:

* BS/MS in Electrical Engineering, Computer Engineering or Computer Science
* 8+ years of work-related experience in Deep Learning and Machine Learning
* 8+ years of Linux
etworking debugging/testing or relevant experience preferred
* 8+ years of data center, enterprise, or telecommunication working on routing and switching networking technologies.
* Experience with DevOps or in cloud environments, including but not limited to Docker/Containers and Kubernetes
* Hands-on experience with workload/scheduler Managers (Slurm) for rack/cluster
* Familiar with MLPerf Training/Inference benchmark, LLM, HPL-AI or RCCL/NCCL
* Programming experience with windows and Linux shell scripting
* Strong sense of teamwork and good team player, strong communication skills



Desired Skills:

1. Familiar with Intel/AMD/NVIDIA development tool kits such as CUDA, oneAPI, ROCm
2. Relevant certifications such as CCIE, JNCIE, or Arista ACE are highly desirable
3. Experience with server
etwork hardware debugging and troubleshooting
4. CCNA, OpenStack, OpenShift, Azure or AWS



Please note that this position requires regular in-office attendance. The successful candidate is expected to be present in the office during standard working hours as determined by the company. In-office collaboration and participation in team meetings, training sessions, and other on-site activities are essential aspects of this role. Candidates should consider the commuting distance and be prepared to fulfill their responsibilities in the designated office location.

Salary Range

$137,000 - $156,000


The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs.

EEO Statement

Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.



In Summary: As a Sr. Network Engineer, you'll be the go-to person to roll out and maintain business critical applications and services for Supermicro . You are responsible for resolving escalated service issues, coaching other engineers to resolutions, engineering and implementing complex projects . You will be a person who is independent with leadership to drive the technical development and with excellent communication skills .

En Español:

Job Req ID: 27692 Sobre Supermicro: Supermicro es un proveedor de nivel superior de servidores avanzados, almacenamiento y soluciones de red para los clientes de Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC e IoT / Embedded en todo el mundo. Somos la 5a compañía con más rápido crecimiento entre las 50 empresas tecnológicas del Silicon Valley. Nuestra expansión global sin precedentes nos ha proporcionado la oportunidad de ofrecer una gran cantidad de nuevos puestos a la comunidad tecnológica. Buscamos ingenieros, tecnólogos y líderes empresariales talentosos, apasionados y comprometidos que se unan a nosotros. Resumen: Como Ingeniero Senior de Redes, usted será la persona principal para desplegar y mantener aplicaciones y servicios comerciales críticos para SuperMicro. También es responsable de resolver problemas de servicio escalados, entrenar a otros técnicos para resolver proyectos complejos de ingeniería e implementar soluciones técnicas. Usted será una persona independiente con liderazgo para impulsar el desarrollo técnico y con excelentes habilidades de comunicación. Funciones esenciales y responsabilidades: Incluye las siguientes tareas básicas y responsabilidades (también se pueden asignar otras funciones): * Ejecutar pruebas integrales a nivel de rack en los últimos procesadores NVidia y AMD GPUs, basados en ARM, Intel Xeon y AMD EPYC, que abarcan la funcionalidad, compatibilidad, rendimiento, estrés y pruebas de fiabilidad, aprovechando herramientas internas patentadas. * Establecer experiencia en aplicaciones e benchmarks HPC/AI, brindar sesiones de capacitación impactantes a clientes y socios, al mismo tiempo que aborda problemas complejos de soporte al cliente, demostrar habilidades innovadoras de resolución de problemas y construir procedimientos robustos para soluciones HPC / AI. Crear y mantener documentación técnica, incluidas notas técnicas, blogs y diagramas, para facilitar la difusión del conocimiento. * Identificar y documentar problemas de calidad de hardware y software y colaborar con Product Management y otros equipos de Ingeniería para integrar los comentarios de clientes en futuras mejoras de productos. * Participar proactivamente en el desarrollo de una hoja de ruta HPC, planificación de software y actualizaciones de hardware para sostener un rendimiento excepcional de infraestructura HPC. * Documentar y analizar planes de pruebas, informes, registros y contribuir activamente al desarrollo de utilidades de prueba y scripts de automatización para agilizar procesos de ensayo. Calificaciones: * BS / MS en ingeniería eléctrica, ingeniería informática o ciencias de computación * 8 años + experiencia laboral relacionada con aprendizaje profundo y aprendizaje automático . Familiar con los kits de herramientas de desarrollo Intel/AMD/NVIDIA como CUDA, oneAPI, ROCm 2. Las certificaciones relevantes como CCIE, JNCIE o Arista ACE son muy deseables 3. Experiencia en el depuración y resolución de problemas de hardware del servidor / red 4. CCNA, OpenStack, OpenShift, Azure o AWS Tenga en cuenta que este puesto requiere asistencia regular en la oficina. Se espera que el candidato exitoso esté presente en la oficinas durante las horas normales de trabajo según lo determine la compañía. La colaboración en la empresa y la participación en reuniones de equipo, sesiones de capacitación y otras actividades in situ son aspectos esenciales de esta función. Los candidatos deben considerar la distancia de desplazamiento y estar preparados para cumplir sus responsabilidades en la ubicación designada. Es política de Supermicro proporcionar igualdad de oportunidades a todos los solicitantes y empleados cualificados sin importar raza, color, religión, sexo, orientación sexual, identidad de género, origen nacional, edad, discapacidad, estatus de veterano protegido o veteranos con discapacidades especiales, estado civil, embarazo, información genética u otro estatus legalmente protegido.