Overview
At PNNL, our core capabilities are divided among major departments that we refer to as Directorates within the Lab, focused on a specific area of scientific research or other function, with its own leadership team and dedicated budget.
Our Science & Technology directorates include National Security, Earth and Biological Sciences, Physical and Computational Sciences, and Energy and Environment. In addition, we have an Environmental Molecular Sciences Laboratory, a Department of Energy, Office of Science user facility housed on the PNNL campus.
The National Security Directorate (NSD) drives science-based, mission-focused solutions to take on complex, real-world threats to our nation and the world.
The AI and Data Analytics Division, part of NSD, combines domain expertise and integration of advanced hardware and software to deliver computational solutions that address complex data and analytic challenges. Working in multidisciplinary teams, we connect foundational research to engineering to operations, providing the tools to innovate quickly and field results faster. Our strengths are integrated across the data analytics lifecycle, from data acquisition and management to analysis and decision support.
Responsibilities
We are seeking an exceptional Lead Software Engineer to architect and build next-generation AI systems at PNNL spanning agentic AI platforms, petabyte-scale data orchestration, and real-time intelligence processing that define the future of national security technology. This role combines deep technical leadership in scalable system design with hands-on expertise in modern AI/ML engineering, requiring someone who can operate with both strategic vision and tactical excellence.
Who You Are
You're an experienced engineer who bridges infrastructure, AI/ML systems, and production-grade software development. You have built highly scalable systems from scratch, led technical initiatives that matter, and have a track record of transforming complex problems into tractable solutions. You are comfortable architecting distributed systems processing terabytes per hour as you are fine-tuning LLMs or building developer tooling. You bring startup agility to mission-critical work where failure isn’t an option.
What You'll Build
AI-Native Systems & Platforms
- Design and deploy scalable agentic AI systems with dynamic reasoning and decision-making capabilities
- Architect LLM orchestration frameworks using LangChain, LlamaIndex, and emerging agent platforms
- Build MLOps platforms spanning experiment tracking, model versioning, deployment, and governance
- Develop developer-focused tooling, adapters, and interfaces for AI-native frameworks
- Integrate multi-modal data sources (text, vision, structured/sensor data) into cohesive reasoning pipelines
Scalable Infrastructure & Data Systems
- Design microservices architectures coordinating across multiple domains and security enclaves
- Lead distributed system design processing data from hundreds of sources simultaneously
- Architect real-time streaming platforms handling terabytes per hour with event-driven architectures
- Build robust data pipelines for petabyte-scale ETL, data lake/mesh architectures, and real-time analytics
- Design container orchestration (Kubernetes) and CI/CD pipelines for classified and edge environments
Mission-Critical Production Systems
- Deploy AI systems in highly secure environments with resilient agent-to-agent communications
- Create monitoring and observability systems (logging, metrics, tracing) across secure enclaves
- Ensure compliance with ethical AI standards and security-first DevOps practices
- Build geospatial processing, time-series, and intelligence data fusion capabilities
Technical Leadership
- Lead a team of engineers to deliver on high risk / high impact ambiguous technical scope
- Drive technical strategy and architectural decisions across cross-functional teams
- Translate ambiguous requirements and cutting-edge research into actionable technical roadmaps
- Lead design discussions shaping team-wide engineering standards
- Mentor engineering teams and guide junior scientists/engineers
Technical Knowledge, Skills, and Abilities
Technical Leadership & Engineering Excellence
- Demonstrated fluency in Python and proficiency in at least one additional language (C#/.NET, Go, C++) with ability to architect solutions and guide language selection decisions across complex, multi-language codebases
- Proven track record of establishing and championing software engineering best practices including version control strategies, comprehensive automated testing frameworks, code quality standards, and technical documentation across engineering teams
- Expert-level proficiency in designing and implementing sophisticated CI/CD pipelines with ability to define DevOps strategies, build/release processes, and deployment architectures that ensure reliable, secure, and efficient software delivery at scale
- Seasoned practitioner with ability to lead engineering teams in defining technical specifications, architectural patterns, and system designs for microservices, distributed systems, and large-scale applications while leveraging AI assist tools to accelerate team productivity
AI/ML Systems Architecture & Implementation
- Proven experience architecting, implementing, and deploying production-grade agentic AI systems with multi-step reasoning, autonomous workflows, and decision-making capabilities into operational environments at scale
- Deep practical expertise with deep learning frameworks (PyTorch, TensorFlow, JAX) and LLM orchestration platforms (LangChain, LlamaIndex, LangGraph) with ability to design complex AI applications, custom chains, retrieval systems, and agent-based architectures
- Advanced expertise in LLM optimization techniques including fine-tuning methodologies (LoRA/PEFT, QLoRA), retrieval-augmented generation (RAG) system design, prompt engineering strategies, and evaluation frameworks
- Comprehensive understanding of the end-to-end machine learning lifecycle with proven ability to architect and build production ML platforms including feature engineering pipelines, model serving infrastructure, monitoring, and automated retraining systems
Cloud Architecture & Distributed Systems
- Demonstrated expertise architecting and deploying enterprise-scale applications across cloud platforms (AWS, Azure, GCP) with ability to design multi-cloud strategies and advanced proficiency in containerization (Docker) and orchestration technologies (Kubernetes) including Infrastructure as Code practices
- Expert ability to architect and implement sophisticated event-driven systems using message brokers (Kafka, RabbitMQ), pub/sub patterns, and serverless functions with consideration for exactly-once semantics, ordering guarantees, and failure handling
- Mastery of cloud native API design patterns including RESTful principles, GraphQL schemas, and gRPC services with proven experience establishing API standards, versioning strategies, and microservice communication patterns for large-scale distributed systems
- Deep understanding of data storage architecture including relational databases (PostgreSQL, MySQL), NoSQL systems (MongoDB, DynamoDB, Cassandra), and data warehouses (Redshift, Snowflake, BigQuery) with ability to design polyglot persistence strategies optimized for specific workload characteristics
Data Platform Engineering & Distributed Processing
- Mastery of cloud-native data pipeline architectures including ETL/ELT design patterns, orchestration frameworks (Airflow, Prefect, Step Functions), and cloud services (AWS Glue, Lambda, Azure Data Factory) with ability to architect enterprise-scale data platforms
- Expert knowledge of distributed data storage systems (S3, Redshift, Delta Lake, PostgreSQL, MongoDB, OpenSearch, Databricks) with proven ability to design data lakehouse architectures and advanced proficiency with distributed computing frameworks (Spark/Databricks, Kafka, Flink, Ray)
- Demonstrated expertise deploying and optimizing scalable ML workloads on distributed platforms using Kubernetes, Ray clusters, or Spark with deep understanding of data modeling principles including schema design, normalization/denormalization strategies, and data quality frameworks
- Proven ability to architect petabyte-scale data systems with appropriate partitioning strategies, indexing approaches, and query optimization patterns while mastering data format selection (Parquet, Avro, ORC, Delta, Iceberg) for optimal compression, performance, and schema evolution
Data Platform Engineering & Distributed Processing
- Proven ability to lead and mentor engineering teams through technical challenges, architecture discussions, and knowledge sharing while establishing team standards for code quality, testing practices, and architectural patterns through mentorship and leading by example
- Expert communication skills to articulate complex technical concepts, system designs, and strategic recommendations to diverse audiences including engineering teams, executive leadership, and stakeholders through comprehensive documentation, architecture decision records, and presentations
- Strategic ability to balance competing priorities including technical excellence, delivery velocity, technical debt management, and innovation while making pragmatic trade-offs that align with organizational objectives
- Experience leading technical planning initiatives including system architecture design, technology evaluation, and roadmap development with proven capability to drive cross-functional collaboration and champion process improvements while adapting to rapidly evolving technical landscapes
What Makes This Role Unique
Intersects cutting-edge AI research and production systems engineering, building platforms that handle sensitive data while mentoring teams and shaping technical strategy. You will work with startup agility on problems of national importance, iterating solutions that push the boundaries of what's possible in AI and distributed systems.
National Interest Project Examples
- Detect and prevent smuggling at ports of entry
- Develop large data pipelines to thwart funding for terrorists and drug cartels
- Applying big data solutions to national security problems
- Applying image classification for nuclear forensics analysis
- Develop capabilities for scalable geospatial analytics
Location and Schedule
This position is based in Richland, WA or Seattle, WA and requires an onsite presence Monday through Thursday, with Friday as required by business needs.
Qualifications
Minimum Qualifications:
- PhD and 3 years of Software Engineering experience -OR-
- MS/MA and 5 years of Software Engineering experience -OR-
- BS/BA and 7 years of Software Engineering experience -OR-
- AA and 16 years of Software Engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development -OR-
- HS/GED and 18 years of Software Engineering experience in designing, architecting, programming, deploying, and automating software solutions in support of scientific research or consumer digital product development
Preferred Qualifications:
- Degree in computer science, software engineering, or related field
- 7+ years of professional software engineering experience with at least 3-5 years in technical leadership or senior engineering roles
- Track record of leading development of production systems serving significant user bases or processing substantial data volumes
- Experience building and leading high-performing engineering teams through mentorship and professional development
- Demonstrated experience leading teams of software engineers and translating complex technical problems into structured, actionable work
- Experience establishing engineering practices, architectural standards, and technical strategies at organizational scale
- Background in multiple domains (AI/ML, distributed systems, data engineering, cloud infrastructure) with ability to bridge technical disciplines
- Prior experience in mission-critical, regulated, or high-security environments (government, defense, healthcare, financial services)
- Established thought leadership and technical influence through open-source maintainership, published technical articles or conference talks, recognized expertise in specific domains
Hazardous Working Conditions/Environment
Not applicable.
Additional Information
This position requires the ability to obtain and maintain a federal security clearance.
Security, Credentialing, and Eligibility Requirements
As a national laboratory, PNNL is responsible for adhering to security and federal background investigations. Applicants will be subject to background investigations and must meet eligibility requirements for access to classified matter as applicable.
Mandatory Requirements
Please be aware of DOE restrictions regarding affiliations with foreign governments and related disclosures upon offer.
Minimum Salary
USD $161,300.00/Yr.
Maximum Salary
USD $255,000.00/Yr.
#J-18808-Ljbffr