Anistar Technologies is hiring: Lead Hyperscale Network Engineer in New York

Anistar Technologies, New York, NY, United States

Job Title: Lead Hyperscale Datacenter Network Engineer
Location: Remote

The base salary range: $200,000 – $300,000 pending experience.

Must have Hyperscale experience
Company Summary
Lead Network Engineer to lead our Network Operations & Reliability pillar. This role will lead the Operations & Reliability team – you'll be building our network operations function from the ground up while being hands‑on with incident response, reliability engineering, and operational tooling. We are looking for someone who is hungry and passionate about the autonomy of building a team and processes that ensure our AI datacenter fabrics run with exceptional reliability at scale.

Focus

Operations Architecture: Define and build the operational model for network reliability at scale. Establish incident response workflows, escalation procedures, runbook frameworks, and operational handoff criteria. Design the systems and processes that enable 24/7 operations across distributed datacenter regions.

Incident Response & Reliability: Own Tier 2+ incident management for network infrastructure. Lead response to critical incidents, perform root cause analysis, drive permanent fixes, and build the reliability engineering practices that prevent recurrence. Partner with NOC on Tier 1 triage and escalation workflows.

Observability & Monitoring: Build comprehensive observability for network infrastructure including monitoring stack integration, alerting frameworks, telemetry collection, and performance analytics. Ensure operators have visibility into fabric health, traffic patterns, and failure conditions across all network layers.

Runbook Development: Author and maintain operational runbooks for common failure scenarios, maintenance procedures, and troubleshooting workflows. Build the knowledge base that enables NOC (Tier 1) and regional operations engineers to respond effectively to incidents.

Automation & Tooling: Drive operational automation initiatives including auto‑remediation, failure classification, and runtime tooling. Partner with Network Automation Engineers on design‑time automation while owning runtime operational tooling that improves MTTR and operational efficiency.

Cross‑Functional Partnership: Collaborate with Deployment teams on production handover criteria, Engineering Core on design feedback from operational experience, Hardware teams on break‑fix coordination, and NOC on escalation procedures. Build strong relationships that enable seamless coordination during incidents.

About You

Proven Operations Leadership: 7+ years in network engineering with significant focus on network operations, reliability engineering, or NOC/SOC leadership. Built operational processes from scratch or significantly scaled existing operations.

Deep Technical Operations Expertise: Strong hands‑on experience operating large‑scale datacenter networks including EVPN/VXLAN, BGP, CLOS architectures, and high‑radix switching.

Reliability Engineering Mindset: Think in terms of MTTR, MTTD, and failure domains. Built monitoring and alerting systems, developed runbooks, and implemented automation that improves operational efficiency.

Incident Command Experience: Led response to critical incidents involving multiple teams and stakeholders. Remains calm under pressure, communicates clearly during outages, and drives incidents to resolution.

Nice to Haves

AI/HPC Fabric Operations: Experience with RDMA (RoCEv2), lossless Ethernet (PFC, ECN), or high‑performance networking.

Hyperscale Operations Background: Experience in network operations at hyperscale companies (Meta, Google, Microsoft, AWS) or large cloud providers.

NOC/SOC Leadership: Experience building or leading Network Operations Centers, shift scheduling, and on‑call rotation management.

Observability Stack Expertise: Familiarity with Prometheus, Grafana, ELK, Datadog, or similar.

Automation & Scripting: Comfortable with Python, Go, Ansible, Terraform.

SRE Principles: Exposure to SLO/SLI definition, error budgets, post‑incident reviews, and operational readiness reviews.

Referral Program
We offer referral incentives for qualified candidates. Contact us for more information.

About Us
Anistar Technologies is an Equal Opportunity Employer and is dedicated to fostering diversity in the workplace. Anistar utilizes E‑Verify. We offer variable hour employment on contract and contract‑to‑hire opportunities, as well as permanent placement. MUST have a valid driver’s license, reliable transportation, and MUST be able to pass drug and background checks.

Contact
Anistar Technologies
PH: 800-257-5597
Fax: 888-293-5055

STAY IN TOUCH
CORPORATE HEADQUARTERS
4300 W Cypress St.
Suite 550
Tampa, Florida 33607

#J-18808-Ljbffr

In Summary: The role will lead the Operations & Reliability team – you'll be building our network operations function from the ground up . You'll be hands‑on with incident response, reliability engineering, and operational tooling . The base salary range: $200,000 – $300,000 pending experience . Must have Hyperscale experience .

En Español: Título de trabajo: Líder Ingeniero de red del centro de datos hiperescala ubicación: remota El rango salarial básico: $200,000 $300,000 experiencia pendiente. Debe tener experiencia en Hyperescale Resumen Company Lead Network Engineer para liderar nuestro pilar de operaciones y fiabilidad de la red. Esta función encabezará el equipo de Operaciones & Confiabilidad. Colaborar con NOC en los flujos de trabajo de triaje y escalada de nivel 1. Observabilidad y monitoreo: Construir una observabilidad integral para la infraestructura de red, incluida la integración de las pilas de vigilancia, los marcos de alerta, la recopilación de telemetría y el análisis del rendimiento. Asegurar que los operadores tengan visibilidad sobre la salud de los tejidos, los patrones de tráfico y las condiciones de fallo a través de todas las capas de la red. Desarrollo de Runbook: Autor y mantenedor de libretas operativas para escenarios comunes de fallos, procedimientos de mantenimiento y procesos de resolución de problemas. Construir la base de conocimientos que permita a NOC (Tier 1) e ingenieros de operaciones regionales responder eficazmente a incidentes. Automatización y herramientas: Iniciativas de automatización operativa incluyendo automática de auto-remediación, clasificación de fallas y funcionamiento de herramienta. Procesos operativos construidos desde cero o que aumentaron significativamente las operaciones existentes. Experiencia en Operaciones Técnicas profundas: experiencia práctica sólida en el manejo de redes de centros de datos a gran escala, incluidas EVPN/VXLAN, BGP, arquitecturas CLOS y conmutación de alto radix. Concepción de ingeniería de fiabilidad: Piense en términos de MTTR, MTTD y dominios de fallas. Sistemas de monitoreo y alerta construidos, libretas de ejecución desarrolladas y automatización implementada que mejoran la eficiencia operacional. Para obtener más información, póngase en contacto con nosotros. Sobre nosotros Anistar Technologies es un Empleador de Igualdad de Oportunidades y se dedica a fomentar la diversidad en el lugar de trabajo.