
HPC Systems Administrator (Hardware & Infrastructure Operations)
Stanford University, Stanford, CA, United States
HPC Systems Administrator (Hardware & Infrastructure Operations)
Business Affairs: University IT (UIT), Stanford, California, United States
Information Technology Services
Post Date Apr 17, 2026
Requisition # 108777
Please note:
Visa Sponsorship is not provided for this position.
The Sherlock HPC cluster is the flagship of Stanford's research computing environment, supportingthousandsofusersandamassivevarietyofscientificworkloads.Wearelookingfor an
HPC Systems Administrator
who thrives at the intersection of high-density hardware and Linux systems engineering.
In this role, you will be the primary steward of the physical infrastructure on Sherlock and other platforms. You will ensure that our 1,500+ compute nodes, high-density GPU racks, and petabyte-scalestoragearraysaremeticulouslymaintained,expertlytuned,andhighlyavailable.
Why
Stanford?
Youwon'tjustbeswappingparts;youwillbemanagingthephysicalbackboneofaworld-class research environment. From debugging errors on NVIDIA H200s to optimizing InfiniBand cabling for our Lustre scratch tiers, your work is the foundation upon which Nobel-caliber research is built.
Primary
Responsibilities
Hardware
Lifecycle
&
Deployment:
Leadthephysicaldeployment,burn-in, troubleshooting,anddecommissioningofcomputenodes,GPUservers,and high-density storage systems.
Diagnostics & Root Cause Analysis:
Perform troubleshooting on hardware issues-suchasmemoryerrors,GPUthermalthrottling,networkfailures-and coordinate with vendors for support and replacements.
Data
Center
Operations:
Collaboratewiththedatacentersteamtoplanandmanage hardware deployments.
Provisioning & Automation:
Work with lead platform administrators on testing and provisioningtoensurerapid,consistentdeploymentofclusterimagesacrossthefleet.
Health
&
Telemetry:
Refinehardware-levelmonitoringtoproactivelyidentifyfailing components before they impact active research jobs.
Required
Qualifications:
Education
: Bachelor's degree and eight years of relevant experience, or a combination of education and relevant experience.
Experience:
3-5+ years of experience in Linux Systems Administration, with a strong preference for candidates from HPC, larges-scale data center, or research environments.
Hardware Proficiency:
Solid understanding of x86 server architecture, GPU systems, ethernet,and high-performance interconnects.
Scripting:
Proficiency in scripting languages for automating hardware health checks, log parsing, and routine maintenance tasks.
Infrastructure Management:
Experience using configuration management tools to manage hardware settings and firmware versions at scale. Experience working with data center teams to populate and maintain DCIM solutions preferred.
Physical Requirements:
Ability to lift up to 50 lbs and work comfortably in a data center environment, including racking equipment and managing complex cable topologies.
Communication:
Strong written and verbal communication skills.
Preferred
Skills
? DirectexperiencemaintaininghardwareforHPCsystemsandlargescalestoragesystems.
? FamiliaritywiththeSlurmworkloadmanagerandhowhardwarehealthimpactsjobscheduling.
? Exposuretoliquidcoolingsolutionsorhigh-densityrackpowermanagement.
Physical
Requirements : *
? Constantlyperformdesk-basedcomputertasks.
? Frequentlysit,grasplightly/finemanipulation.
? Occasionallystand/walk,writingbyhand.
? Rarelyuseatelephone,lift/carry/push/pullobjectsthatweighupto10pounds.
Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting acontact form.
Working
Conditions:
? Mayworkextendedhours,evenings,andweekends.
Work
Standards:
? InterpersonalSkills:DemonstratestheabilitytoworkwellwithStanfordcolleaguesandclients and with external organizations.
? Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for safety;communicatessafetyconcerns;usesandpromotessafebehaviorsbasedontrainingand lessons learned.
? SubjecttoandexpectedtostayinsyncwithallapplicableUniversitypoliciesandprocedures, including but not limited to the personnel policies and other policies found in Stanford's Administrative Guide,http://adminguide.stanford.edu.
The expected pay range for this position is
$150,289 to $171,674
per annum.
Stanford University provides pay ranges representing its good faith estimate of the salary or hourly wage the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs.
At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website (https://cardinalatwork.stanford.edu/benefits-rewards) provides detailed information on Stanford's extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process.
The job duties listed are typical examples of work performed by position in this job classification and are not designed to contain or be interpreted as a comprehensive inventory of all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties assigned.
Stanford is an equal employment opportunity and affirmative action employer. All qualifies applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law.
Additional Information
Schedule: Full-time
Job Code: 4833
Employee Status: Regular
Grade: K
Requisition ID: 108777
Work Arrangement : On Site
Business Affairs: University IT (UIT), Stanford, California, United States
Information Technology Services
Post Date Apr 17, 2026
Requisition # 108777
Please note:
Visa Sponsorship is not provided for this position.
The Sherlock HPC cluster is the flagship of Stanford's research computing environment, supportingthousandsofusersandamassivevarietyofscientificworkloads.Wearelookingfor an
HPC Systems Administrator
who thrives at the intersection of high-density hardware and Linux systems engineering.
In this role, you will be the primary steward of the physical infrastructure on Sherlock and other platforms. You will ensure that our 1,500+ compute nodes, high-density GPU racks, and petabyte-scalestoragearraysaremeticulouslymaintained,expertlytuned,andhighlyavailable.
Why
Stanford?
Youwon'tjustbeswappingparts;youwillbemanagingthephysicalbackboneofaworld-class research environment. From debugging errors on NVIDIA H200s to optimizing InfiniBand cabling for our Lustre scratch tiers, your work is the foundation upon which Nobel-caliber research is built.
Primary
Responsibilities
Hardware
Lifecycle
&
Deployment:
Leadthephysicaldeployment,burn-in, troubleshooting,anddecommissioningofcomputenodes,GPUservers,and high-density storage systems.
Diagnostics & Root Cause Analysis:
Perform troubleshooting on hardware issues-suchasmemoryerrors,GPUthermalthrottling,networkfailures-and coordinate with vendors for support and replacements.
Data
Center
Operations:
Collaboratewiththedatacentersteamtoplanandmanage hardware deployments.
Provisioning & Automation:
Work with lead platform administrators on testing and provisioningtoensurerapid,consistentdeploymentofclusterimagesacrossthefleet.
Health
&
Telemetry:
Refinehardware-levelmonitoringtoproactivelyidentifyfailing components before they impact active research jobs.
Required
Qualifications:
Education
: Bachelor's degree and eight years of relevant experience, or a combination of education and relevant experience.
Experience:
3-5+ years of experience in Linux Systems Administration, with a strong preference for candidates from HPC, larges-scale data center, or research environments.
Hardware Proficiency:
Solid understanding of x86 server architecture, GPU systems, ethernet,and high-performance interconnects.
Scripting:
Proficiency in scripting languages for automating hardware health checks, log parsing, and routine maintenance tasks.
Infrastructure Management:
Experience using configuration management tools to manage hardware settings and firmware versions at scale. Experience working with data center teams to populate and maintain DCIM solutions preferred.
Physical Requirements:
Ability to lift up to 50 lbs and work comfortably in a data center environment, including racking equipment and managing complex cable topologies.
Communication:
Strong written and verbal communication skills.
Preferred
Skills
? DirectexperiencemaintaininghardwareforHPCsystemsandlargescalestoragesystems.
? FamiliaritywiththeSlurmworkloadmanagerandhowhardwarehealthimpactsjobscheduling.
? Exposuretoliquidcoolingsolutionsorhigh-densityrackpowermanagement.
Physical
Requirements : *
? Constantlyperformdesk-basedcomputertasks.
? Frequentlysit,grasplightly/finemanipulation.
? Occasionallystand/walk,writingbyhand.
? Rarelyuseatelephone,lift/carry/push/pullobjectsthatweighupto10pounds.
Consistent with its obligations under the law, the University will provide reasonable accommodations to applicants and employees with disabilities. Applicants requiring a reasonable accommodation for any part of the application or hiring process should contact Stanford University Human Resources by submitting acontact form.
Working
Conditions:
? Mayworkextendedhours,evenings,andweekends.
Work
Standards:
? InterpersonalSkills:DemonstratestheabilitytoworkwellwithStanfordcolleaguesandclients and with external organizations.
? Promote Culture of Safety: Demonstrates commitment to personal responsibility and value for safety;communicatessafetyconcerns;usesandpromotessafebehaviorsbasedontrainingand lessons learned.
? SubjecttoandexpectedtostayinsyncwithallapplicableUniversitypoliciesandprocedures, including but not limited to the personnel policies and other policies found in Stanford's Administrative Guide,http://adminguide.stanford.edu.
The expected pay range for this position is
$150,289 to $171,674
per annum.
Stanford University provides pay ranges representing its good faith estimate of the salary or hourly wage the university reasonably expects to pay for a position. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location and external market pay for comparable jobs.
At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website (https://cardinalatwork.stanford.edu/benefits-rewards) provides detailed information on Stanford's extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process.
The job duties listed are typical examples of work performed by position in this job classification and are not designed to contain or be interpreted as a comprehensive inventory of all duties, tasks, and responsibilities. Specific duties and responsibilities may vary depending on department or program needs without changing the general nature and scope of the job or level of responsibility. Employees may also perform other duties assigned.
Stanford is an equal employment opportunity and affirmative action employer. All qualifies applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law.
Additional Information
Schedule: Full-time
Job Code: 4833
Employee Status: Regular
Grade: K
Requisition ID: 108777
Work Arrangement : On Site