
HCLTech is hiring: Artificial Intelligence Engineer in Ashburn
HCLTech, Ashburn, VA, United States
Senior AI Engineer – AI Center of Excellence (AI CoE)
Experience: 12+ Years
Role Overview
This is a strategic, hands‑on senior engineering role within the AI Center of Excellence (AI CoE), responsible for designing, building, and operating AI infrastructure and AI Factory platforms across hybrid environments (on‑prem, private cloud, and public cloud).
The role works closely with client and leading OEM partners as well as internal Sales, Pre‑Sales, and Delivery teams, to identify, shape, and execute AI‑driven business opportunities across the US and EU regions.
This is a quota‑driven, techno‑commercial role requiring deep technical execution along with stakeholder interaction and customer‑facing leadership.
Key Responsibilities
Design, deploy, and operate hybrid Kubernetes clusters across AWS, Azure, GCP, and on‑prem environments (bare metal, NVIDIA DGX, Grace Hopper).
Own production‑grade GPU infrastructure using NVIDIA GPU Operator , including:
CUDA, drivers, MIG
GPU‑aware scheduling and resource isolation policies
Build and maintain high‑availability, scalable AI platforms supporting enterprise workloads.
MLOps & GenAI Platform Development
Kubeflow Pipelines
GitOps (Argo CD / Flux)
MLflow / DVC
Deploy and operate Large Language Models (LLMs) using:
NVIDIA Triton Inference Server
vLLM
Custom FastAPI / gRPC services
Implement advanced inference techniques:
Dynamic batching
Safety & content filtering integrations
Data & Retrieval-Augmented Generation (RAG)
Integrate and optimize vector databases for RAG and similarity search:
Enable scalable semantic search and GenAI-powered enterprise applications.
Observability, Security & Reliability
Implement full‑stack observability using:
Loki / ELK
OpenTelemetry
Define and monitor SLIs / SLOs for AI platforms.
Enforce security and compliance standards:
Vault / KMS
Image signing, policy enforcement
GDPR / HIPAA compliance
Cost, Performance & Capacity Optimization
Optimize GPU utilization through:
Capacity planning
Cost transparency and chargeback models
Improve platform efficiency while maintaining performance SLAs.
Enablement & Technical Leadership
Enable engineering teams through:
Technical documentation
Tutorials and best practices
Office hours and knowledge sessions
Evaluate emerging technologies and lead PoCs across:
NVIDIA innovations
Drive the AI Infra & Platform technology roadmap .
Required Experience & Skills
Technical Expertise
8+ years of hands‑on experience designing and operating production Kubernetes platforms (cloud + on‑prem).
Deep expertise in NVIDIA GPU stack (CUDA, MIG, GPU Operator).
Strong hands‑on experience with:
Kubeflow Pipelines or equivalent MLOps platforms
Large‑scale LLM deployment and inference optimization
Proficiency in Python and AI frameworks:
PyTorch, TensorFlow
Infrastructure as Code (IaC):
Experience with vector databases and RAG architectures.
Strong SRE / observability background .
Security‑first mindset with enterprise compliance exposure.
Nice to Have
Experience with NVIDIA DGX and Grace Hopper platforms.
Knowledge of OpenShift, k3s , or edge‑focused deployments.
Experience with:
KServe, LWS, serverless inference
Contributions to open‑source projects (Kubernetes, Kubeflow, Triton, Milvus, vLLM).
Certifications:
CKA
Cloud AI/ML certifications
NVIDIA certifications
Qualifications
B.E / B.Tech with a minimum 60% across academics .
Proven experience delivering AI solutions across on‑prem, cloud, and hybrid environments .
Strong analytical, strategic thinking, and stakeholder communication skills.
Solid understanding of data centers, cloud platforms, AI & GenAI ecosystems .
Role Specifics
High ownership, visibility, and impact role
Disclaimer
HCL is an equal opportunity employer, committed to providing equal employment opportunities to all applicants and employees regardless of race, religion, sex, color, age, national origin, pregnancy, sexual orientation, physical disability or genetic information, military or veteran status, or any other protected classification, in accordance with federal, state, and/or local law. Should any applicant have concerns about discrimination in the hiring process, they should provide a detailed report of those concerns to secure@hcltech.com for investigation.
Compensation and Benefits
A candidate’s pay within the range will depend on their work location, skills, experience, education, and other factors permitted by law. This role may also be eligible for performance‑based bonuses subject to company policies. In addition, this role is eligible for the following benefits subject to company policies: medical, dental, vision, pharmacy, life, accidental death & dismemberment, and disability insurance; employee assistance program; 401(k) retirement plan; 10 days of paid time off per year (some positions are eligible for need‑based leave with no designated number of leave days per year); and 10 paid holidays per year.
#J-18808-Ljbffr
Experience: 12+ Years
Role Overview
This is a strategic, hands‑on senior engineering role within the AI Center of Excellence (AI CoE), responsible for designing, building, and operating AI infrastructure and AI Factory platforms across hybrid environments (on‑prem, private cloud, and public cloud).
The role works closely with client and leading OEM partners as well as internal Sales, Pre‑Sales, and Delivery teams, to identify, shape, and execute AI‑driven business opportunities across the US and EU regions.
This is a quota‑driven, techno‑commercial role requiring deep technical execution along with stakeholder interaction and customer‑facing leadership.
Key Responsibilities
Design, deploy, and operate hybrid Kubernetes clusters across AWS, Azure, GCP, and on‑prem environments (bare metal, NVIDIA DGX, Grace Hopper).
Own production‑grade GPU infrastructure using NVIDIA GPU Operator , including:
CUDA, drivers, MIG
GPU‑aware scheduling and resource isolation policies
Build and maintain high‑availability, scalable AI platforms supporting enterprise workloads.
MLOps & GenAI Platform Development
Kubeflow Pipelines
GitOps (Argo CD / Flux)
MLflow / DVC
Deploy and operate Large Language Models (LLMs) using:
NVIDIA Triton Inference Server
vLLM
Custom FastAPI / gRPC services
Implement advanced inference techniques:
Dynamic batching
Safety & content filtering integrations
Data & Retrieval-Augmented Generation (RAG)
Integrate and optimize vector databases for RAG and similarity search:
Enable scalable semantic search and GenAI-powered enterprise applications.
Observability, Security & Reliability
Implement full‑stack observability using:
Loki / ELK
OpenTelemetry
Define and monitor SLIs / SLOs for AI platforms.
Enforce security and compliance standards:
Vault / KMS
Image signing, policy enforcement
GDPR / HIPAA compliance
Cost, Performance & Capacity Optimization
Optimize GPU utilization through:
Capacity planning
Cost transparency and chargeback models
Improve platform efficiency while maintaining performance SLAs.
Enablement & Technical Leadership
Enable engineering teams through:
Technical documentation
Tutorials and best practices
Office hours and knowledge sessions
Evaluate emerging technologies and lead PoCs across:
NVIDIA innovations
Drive the AI Infra & Platform technology roadmap .
Required Experience & Skills
Technical Expertise
8+ years of hands‑on experience designing and operating production Kubernetes platforms (cloud + on‑prem).
Deep expertise in NVIDIA GPU stack (CUDA, MIG, GPU Operator).
Strong hands‑on experience with:
Kubeflow Pipelines or equivalent MLOps platforms
Large‑scale LLM deployment and inference optimization
Proficiency in Python and AI frameworks:
PyTorch, TensorFlow
Infrastructure as Code (IaC):
Experience with vector databases and RAG architectures.
Strong SRE / observability background .
Security‑first mindset with enterprise compliance exposure.
Nice to Have
Experience with NVIDIA DGX and Grace Hopper platforms.
Knowledge of OpenShift, k3s , or edge‑focused deployments.
Experience with:
KServe, LWS, serverless inference
Contributions to open‑source projects (Kubernetes, Kubeflow, Triton, Milvus, vLLM).
Certifications:
CKA
Cloud AI/ML certifications
NVIDIA certifications
Qualifications
B.E / B.Tech with a minimum 60% across academics .
Proven experience delivering AI solutions across on‑prem, cloud, and hybrid environments .
Strong analytical, strategic thinking, and stakeholder communication skills.
Solid understanding of data centers, cloud platforms, AI & GenAI ecosystems .
Role Specifics
High ownership, visibility, and impact role
Disclaimer
HCL is an equal opportunity employer, committed to providing equal employment opportunities to all applicants and employees regardless of race, religion, sex, color, age, national origin, pregnancy, sexual orientation, physical disability or genetic information, military or veteran status, or any other protected classification, in accordance with federal, state, and/or local law. Should any applicant have concerns about discrimination in the hiring process, they should provide a detailed report of those concerns to secure@hcltech.com for investigation.
Compensation and Benefits
A candidate’s pay within the range will depend on their work location, skills, experience, education, and other factors permitted by law. This role may also be eligible for performance‑based bonuses subject to company policies. In addition, this role is eligible for the following benefits subject to company policies: medical, dental, vision, pharmacy, life, accidental death & dismemberment, and disability insurance; employee assistance program; 401(k) retirement plan; 10 days of paid time off per year (some positions are eligible for need‑based leave with no designated number of leave days per year); and 10 paid holidays per year.
#J-18808-Ljbffr