Mediabistro logo
job logo

Kubernetes Big Data Engineer

Eliassen Group, Rockville, MD, United States


Description:
Hybrid 3 days onsite / 2 days remote in Rockville, MD

Our client seeks a Big Data Engineer to design and optimize large-scale data processing on AWS with Spark and Kubernetes. The role will implement containerized workloads on EMR on EKS, build scalable data pipelines, and improve performance, reliability, and observability. The engineer will collaborate with cross‑functional teams, apply Spark tuning expertise, and manage Kubernetes‑based infrastructure to support data‑driven outcomes. Financial industry experience is beneficial.

We can facilitate W2 and corp‑to‑corp consultants. For our W2 consultants, we offer a great benefits package that includes Medical, Dental, and Vision benefits, 401k with company matching, and life insurance.

Rate: $54.00 to $64.00/hr. w2

JN -042026-106358

Responsibilities:

Design, develop, and maintain large‑scale data processing pipelines using Hadoop, Spark, Python, and Scala.

Architect and deploy containerized big data workloads on Amazon EMR on EKS.

Design and implement Kubernetes‑based infrastructure for running Spark applications at scale.

Implement scalable ingestion, storage, transformation, and analysis solutions.

Stay current with industry trends and emerging big data technologies to improve architecture.

Collaborate with cross‑functional teams to translate business requirements into technical solutions.

Optimize and enhance existing data pipelines for performance, scalability, and reliability.

Develop automated testing frameworks and implement continuous testing for data quality.

Conduct unit, integration, and system testing for data pipeline robustness and accuracy.

Support data scientists and analysts with reliable datasets and tooling.

Write and maintain automated unit, integration, and end‑to‑end tests.

Monitor and troubleshoot production data pipelines and resolve issues.

Manage Kubernetes clusters, pods, services, and deployments for big data workloads.

Experience Requirements:

Hands‑on experience with AI development tools such as GitHub Copilot, Q Developer, ChatGPT, or Claude.

Proficiency with Hadoop, Spark, Hive, and Trino.

Experience addressing data skew, petabyte‑scale processing, and remediation of resource, data quality, and scalability issues.

Strong Kubernetes experience including pods, services, deployments, namespaces, ConfigMaps, and Secrets.

Hands‑on EMR on EKS experience for Spark workloads.

Kubernetes resource management, scheduling, and auto‑scaling expertise.

Knowledge of Helm charts, Kubernetes networking, PVs/PVCs, security best practices, kubectl, and YAML manifests.

Ability to troubleshoot cluster issues, pod failures, resource constraints, and Spark‑on‑Kubernetes integration with dynamic allocation.

Prompt engineering, AI workflow design, AI‑driven analysis, and change management for AI adoption.

Deep understanding of Spark internals including executors, tasks, stages, and DAGs.

Spark performance tuning including partitioning, caching, and broadcast joins with proven optimization of large datasets.

Experience troubleshooting slow or stuck Spark jobs and resource issues.

Strong AWS experience including S3, EMR, EMR on EKS, Glue, Lambda, and Athena.

Hands‑on S3 usage with Spark including file formats and consistency behaviors.

Amazon EKS architecture and best practices, IAM roles for service accounts (IRSA), and VPC networking for EKS.

AWS monitoring and logging for Kubernetes using CloudWatch and CloudTrail; familiarity with serverless services such as Lambda and Fargate.

Proficiency in Python or Scala with clean, modular, performant code and functional programming concepts.

Strong understanding of collections, concurrency, and memory management.

Advanced SQL skills including window functions, multi‑table joins, aggregations, and edge case handling.

Production ETL/pipeline management experience (preferred).

CI/CD experience with tools such as Jenkins, GitLab CI, GitHub Actions, or ArgoCD (preferred).

Infrastructure as Code for EKS and EMR on EKS using Terraform or CloudFormation (preferred).

Comprehensive test case development and test automation experience (preferred).

Docker and container image optimization experience (preferred).

Knowledge of service mesh such as Istio or Linkerd (preferred).

Monitoring and observability with Prometheus, Grafana, or ELK (preferred).

AWS or Kubernetes certifications such as AI Practitioner, Solutions Architect, Big Data Specialty, CKA, or CKAD (preferred).

Experience with GitOps practices for Kubernetes deployments (preferred).

Education Requirements:

Bachelor's degree in Computer Science, Information Systems, or related discipline, or equivalent training and work experience.

Master's degree preferred.

AWS certifications such as AI Practitioner, Solutions Architect, or Big Data Specialty (preferred).

Kubernetes certifications such as CKA or CKAD (preferred).

#J-18808-Ljbffr