
Associate Principal, Storage Engineering
The Options Clearing Corporation, Chicago, Illinois, United States, 60290
* Lead the design, deployment, and maintenance of enterprise Linux server environments (RHEL, CentOS, Ubuntu, SUSE, Amazon Linux) with hands-on configuration and troubleshooting across on-premises and AWS cloud infrastructure* Plan, execute, and manage enterprise-wide Linux patching strategies including security patches, kernel updates, and critical vulnerability remediation across thousands of servers* Develop and maintain comprehensive disaster recovery (DR) plans for Linux infrastructure including RPO/RTO targets, failover procedures, and recovery testing schedules* Implement and enforce CIS (Center for Internet Security) benchmarks and security baselines across all Linux systems including automated compliance scanning, remediation, and reporting* Plan, execute, and manage RHEL operating system upgrades across enterprise environments including in-place upgrades (Leapp), migration strategies, and rollback procedures* Develop and implement infrastructure automation strategies using Ansible Automation Platform (AAP) including playbook development, workflow orchestration, and automation controller management* Manage and optimize Red Hat Satellite infrastructure for system provisioning, patch management, and content lifecycle management across the enterprise* Implement and manage automated patching workflows using Red Hat Satellite, Ansible, and AWS Systems Manager for both on-premises and cloud environments* Design, deploy, and manage AWS Linux EC2 instances including instance configuration, auto-scaling, and integration with AWS services* Create, maintain, and manage AMI (Amazon Machine Image) lifecycle including image hardening, patching, golden image development, and automated AMI pipeline creation* Implement AMI versioning strategies, testing procedures, and distribution processes across multiple AWS accounts and regions* Design and implement disaster recovery solutions including backup strategies, replication technologies, failover automation, and multi-region/multi-site architectures* Design and maintain NFS storage solutions and distributed file systems for enterprise applications* Architect, deploy, and manage OpenShift container platforms and Kubernetes environments in hybrid cloud configurations* Implement and support Red Hat Dev Spaces for cloud-native development workflows* Conduct regular DR drills and testing to validate backup and recovery procedures* Develop and maintain security hardening standards based on CIS benchmarks, STIG requirements, and organizational security policies* Manage incidents, requests, and change management processes using ITSM tools such as ServiceNow including ticket resolution, escalations, and SLA compliance* Maintain technical documentation, knowledge base articles, runbooks, and operational procedures in Confluence* Establish and enforce Linux server security standards, hardening procedures, and compliance protocols across on-premises and cloud environments* Oversee system performance monitoring, capacity planning, and optimization initiatives across all platforms* Provide escalation support for complex technical issues and lead incident response efforts* Collaborate with cross-functional teams including networking, storage, security, and application development* Drive continuous improvement initiatives and evaluate emerging Red Hat, AWS, and cloud-native technologies* Create and maintain comprehensive technical documentation, runbooks, and standard operating procedures* Participate in on-call rotation and provide 24/7 support for critical systems as needed* Lead vendor management activities and coordinate with Red Hat and AWS support* Provide technical mentorship and guidance to Linux administrators and junior team members* Lead technical training sessions and knowledge transfer initiatives on Ansible, Satellite, OpenShift, AWS, patching, and DR procedures* 10+ years of progressive hands-on experience in Linux/Unix system administration* 5+ years in a technical leadership or senior engineering role* Strong hands-on experience with Ansible Automation Platform (AAP) including automation controller, execution environments, and workflow development* Proven expertise in Red Hat Satellite for system lifecycle management and content management* Extensive experience planning and executing enterprise-scale Linux patching programs including change management, patch testing, and emergency patching procedures* Demonstrated experience designing and implementing disaster recovery solutions for Linux infrastructure including backup/restore, replication, and failover strategies* Demonstrated experience planning and executing RHEL OS upgrades across major versions (e.g., RHEL 7 to 8, RHEL 8 to 9) using Leapp and other upgrade methodologies* Extensive hands-on experience with AWS Linux EC2 instances, including Amazon Linux and RHEL on AWS* Demonstrated experience in AMI creation, customization, hardening, and lifecycle management* Proven track record of building automated AMI pipelines using tools such as Packer, Ansible, or AWS Image Builder* Demonstrated experience with AWS cloud services and hybrid cloud architectures* Extensive hands-on experience with OpenShift container platform and Kubernetes orchestration* Demonstrated experience implementing and managing NFS and distributed storage solutions* Working knowledge of Red Hat Dev Spaces for development environment provisioning* Proven track record of designing and implementing large-scale automated Linux infrastructure in hybrid environments* Strong understanding of DevOps principles and CI/CD methodologies* Excellent problem-solving abilities and analytical thinking skills* Outstanding communication skills with ability to explain technical concepts to non-technical stakeholders* Strong project management capabilities and ability to manage multiple priorities* Red Hat certifications (RHCE, RHCA) and/or AWS certifications (Solutions Architect, SysOps Administrator) highly preferred* Advanced hands-on proficiency in Red Hat Enterprise Linux administration and troubleshooting* Extensive experience with Linux patching and patch management including:
+ Enterprise-scale patch deployment using Red Hat Satellite and Ansible
+ Patch testing and validation in non-production environments
+ Emergency and zero-day vulnerability patching procedures
+ Kernel patching strategies including live patching (kpatch)
+ Patch rollback and recovery procedures
+ Compliance reporting and audit trail maintenance
+ Patch scheduling and maintenance window coordination
+ AWS Systems Manager Patch Manager for cloud-based patching* Expert-level disaster recovery and business continuity experience including:
+ Backup and restore strategies (Bacula, Veeam, AWS Backup, snapshots)
+ Replication technologies (rsync, DRBD, storage-level replication)
+ Multi-site and multi-region DR architectures
+ RPO/RTO analysis and optimization
+ Failover and failback automation
+ DR testing and validation procedures
+ Disaster recovery documentation and runbooks
+ Cloud-based DR solutions (AWS disaster recovery services)* Extensive experience with RHEL OS upgrade processes including:
+ In-place upgrades using Leapp utility (RHEL 7→8, RHEL 8→9)
+ Pre-upgrade assessment and compatibility testing
+ Application compatibility validation and remediation
+ Upgrade automation using Ansible and Satellite
+ Rollback and disaster recovery planning for upgrade failures
+ Post-upgrade validation and system optimization
+ Managing kernel and package dependencies during upgrades* Expert-level experience with Ansible Automation Platform (AAP) including playbook development, roles, collections, automation controller, and execution environments* Strong expertise in Red Hat Satellite for provisioning, patch management, configuration management, and content views* Extensive hands-on experience with AWS Linux EC2 including instance management, #J-18808-Ljbffr
+ Enterprise-scale patch deployment using Red Hat Satellite and Ansible
+ Patch testing and validation in non-production environments
+ Emergency and zero-day vulnerability patching procedures
+ Kernel patching strategies including live patching (kpatch)
+ Patch rollback and recovery procedures
+ Compliance reporting and audit trail maintenance
+ Patch scheduling and maintenance window coordination
+ AWS Systems Manager Patch Manager for cloud-based patching* Expert-level disaster recovery and business continuity experience including:
+ Backup and restore strategies (Bacula, Veeam, AWS Backup, snapshots)
+ Replication technologies (rsync, DRBD, storage-level replication)
+ Multi-site and multi-region DR architectures
+ RPO/RTO analysis and optimization
+ Failover and failback automation
+ DR testing and validation procedures
+ Disaster recovery documentation and runbooks
+ Cloud-based DR solutions (AWS disaster recovery services)* Extensive experience with RHEL OS upgrade processes including:
+ In-place upgrades using Leapp utility (RHEL 7→8, RHEL 8→9)
+ Pre-upgrade assessment and compatibility testing
+ Application compatibility validation and remediation
+ Upgrade automation using Ansible and Satellite
+ Rollback and disaster recovery planning for upgrade failures
+ Post-upgrade validation and system optimization
+ Managing kernel and package dependencies during upgrades* Expert-level experience with Ansible Automation Platform (AAP) including playbook development, roles, collections, automation controller, and execution environments* Strong expertise in Red Hat Satellite for provisioning, patch management, configuration management, and content views* Extensive hands-on experience with AWS Linux EC2 including instance management, #J-18808-Ljbffr