
Senior Storage Systems Administrator
Science Gateways, Palo Alto, CA, United States
Storage Architect - Senior Storage Systems Administrator - Stanford University
Reports to:
Research & HPC Data Platforms Team Lead
Please note:
Visa sponsorship is not available for this position.
The Opportunity
Stanford University Research Computing is seeking a storage expert to join the Research & HPC Data Platforms group. This is a flexible-level opening, and the university is hiring either a
Storage Architect
or a
Senior Storage Systems Administrator
to help maintain and expand a world-class infrastructure.
In this role, you will work closely with the team lead to manage an environment spanning more than
100 PB of storage
and
5 billion files , including high-performance Lustre systems, MinIO object storage, Lustre HSM, and related technologies.
Why Stanford?
You won’t simply be managing storage systems—you’ll help power a research ecosystem supporting world-class work across every academic discipline.
Storage Architect
Key Responsibilities
Architecture:
Evolve existing storage designs to meet the needs of future computing platforms and research goals.
Platform Management:
Lead scaling, reliability, security, compliance, operations, and lifecycle management of primary research storage platforms, including environments handling sensitive data.
Tiered Storage:
Oversee Lustre HSM integration on the Elm platform, including data movement between parallel filesystems and MinIO object storage.
Performance Engineering:
Optimize I/O performance for large-scale HPC and AI workloads.
Community Stewardship:
Represent Stanford in the Lustre ecosystem and related communities, contribute to upstream roadmaps, and maintain a vendor-neutral storage strategy.
Required Qualifications
Bachelor’s degree plus eight years of relevant experience, or an equivalent combination of education and experience.
8+ years of hands‑on experience designing, building, and managing Lustre, ZFS, or similar platforms at
20 PB+ scale .
Deep expertise with MinIO, Lustre HSM, copy tools, RobinHood, or comparable tools.
Expert knowledge of Linux kernel internals and large‑scale InfiniBand/ Ethernet tuning.
Strong troubleshooting skills involving kernel panics, LNet congestion, and metadata bottlenecks.
Demonstrated leadership in mentoring administrators and leading large‑scale migrations without data loss.
Excellent written and verbal communication skills.
Senior Storage Systems Administrator
Key Responsibilities
Platform Management:
Support scaling, reliability, security, compliance, operations, and lifecycle management of key research storage systems.
Operational Excellence:
Perform complex filesystem upgrades, kernel patching, and hardware refreshes with minimal downtime.
Monitoring & Telemetry:
Build and maintain observability systems for real‑time I/O tracking and trend analysis.
User Support:
Serve as an escalation point for researchers dealing with I/O bottlenecks, job failures, or data access issues.
Maintenance:
Manage physical and logical health of storage systems, including RMAs, firmware upgrades, and drive replacement cycles.
Required Qualifications
5+ years of Linux systems administration experience, including 3+ years in HPC or large‑scale data environments.
Strong practical experience with Lustre, ZFS, MinIO, or similar technologies.
Advanced scripting skills for automation and log analysis.
Comfortable diagnosing hardware failures and understanding power/cooling needs for dense storage systems.
Excellent written and verbal communication skills.
Physical Requirements
Constant computer‑based desk work.
Frequent sitting and fine hand manipulation.
Occasional standing, walking, and writing by hand.
Rarely required to use a telephone or lift/carry/push/pull items up to 10 pounds.
* In accordance with applicable law, the university provides reasonable accommodations to qualified applicants and employees with disabilities. Applicants needing accommodation during the hiring process should contact Stanford Human Resources through the university’s official accommodation request process.
#J-18808-Ljbffr
Reports to:
Research & HPC Data Platforms Team Lead
Please note:
Visa sponsorship is not available for this position.
The Opportunity
Stanford University Research Computing is seeking a storage expert to join the Research & HPC Data Platforms group. This is a flexible-level opening, and the university is hiring either a
Storage Architect
or a
Senior Storage Systems Administrator
to help maintain and expand a world-class infrastructure.
In this role, you will work closely with the team lead to manage an environment spanning more than
100 PB of storage
and
5 billion files , including high-performance Lustre systems, MinIO object storage, Lustre HSM, and related technologies.
Why Stanford?
You won’t simply be managing storage systems—you’ll help power a research ecosystem supporting world-class work across every academic discipline.
Storage Architect
Key Responsibilities
Architecture:
Evolve existing storage designs to meet the needs of future computing platforms and research goals.
Platform Management:
Lead scaling, reliability, security, compliance, operations, and lifecycle management of primary research storage platforms, including environments handling sensitive data.
Tiered Storage:
Oversee Lustre HSM integration on the Elm platform, including data movement between parallel filesystems and MinIO object storage.
Performance Engineering:
Optimize I/O performance for large-scale HPC and AI workloads.
Community Stewardship:
Represent Stanford in the Lustre ecosystem and related communities, contribute to upstream roadmaps, and maintain a vendor-neutral storage strategy.
Required Qualifications
Bachelor’s degree plus eight years of relevant experience, or an equivalent combination of education and experience.
8+ years of hands‑on experience designing, building, and managing Lustre, ZFS, or similar platforms at
20 PB+ scale .
Deep expertise with MinIO, Lustre HSM, copy tools, RobinHood, or comparable tools.
Expert knowledge of Linux kernel internals and large‑scale InfiniBand/ Ethernet tuning.
Strong troubleshooting skills involving kernel panics, LNet congestion, and metadata bottlenecks.
Demonstrated leadership in mentoring administrators and leading large‑scale migrations without data loss.
Excellent written and verbal communication skills.
Senior Storage Systems Administrator
Key Responsibilities
Platform Management:
Support scaling, reliability, security, compliance, operations, and lifecycle management of key research storage systems.
Operational Excellence:
Perform complex filesystem upgrades, kernel patching, and hardware refreshes with minimal downtime.
Monitoring & Telemetry:
Build and maintain observability systems for real‑time I/O tracking and trend analysis.
User Support:
Serve as an escalation point for researchers dealing with I/O bottlenecks, job failures, or data access issues.
Maintenance:
Manage physical and logical health of storage systems, including RMAs, firmware upgrades, and drive replacement cycles.
Required Qualifications
5+ years of Linux systems administration experience, including 3+ years in HPC or large‑scale data environments.
Strong practical experience with Lustre, ZFS, MinIO, or similar technologies.
Advanced scripting skills for automation and log analysis.
Comfortable diagnosing hardware failures and understanding power/cooling needs for dense storage systems.
Excellent written and verbal communication skills.
Physical Requirements
Constant computer‑based desk work.
Frequent sitting and fine hand manipulation.
Occasional standing, walking, and writing by hand.
Rarely required to use a telephone or lift/carry/push/pull items up to 10 pounds.
* In accordance with applicable law, the university provides reasonable accommodations to qualified applicants and employees with disabilities. Applicants needing accommodation during the hiring process should contact Stanford Human Resources through the university’s official accommodation request process.
#J-18808-Ljbffr