Mediabistro logo
job logo

Pyspark Lead

Purple Drive, Owings Mills, MD, United States


Overview:

Role: Pyspark Lead

Location: Owings Mills, MD

ROLE_DESCRIPTION -

• 10+ years of experience in big data and distributed computing.

• Very Strong hands-on experience with PySpark, Apache Spark, and Python.

• Strong Hands on experience with SQL and NoSQL databases (DB2, PostgreSQL, Snowflake, etc.).

• Proficiency in data modeling and ETL workflows.

• Proficiency with workflow schedulers like Airflow

• Hands on experience with AWS cloud-based data platforms.

• Experience in DevOps, CI/CD pipelines, and containerization (Docker, Kubernetes) is a plus.

• Strong problem-solving skills and ability to lead a team

• Lead the design, development, and deployment of PySpark-based big data solutions.

• Architect and optimize ETL pipelines for structured and unstructured data.

• Collaborate with Client, data engineers, data scientists, and business teams to understand requirements and provide scalable solutions.

• Optimize Spark performance through partitioning, caching, and tuning.

• Implement best practices in data engineering (CI/CD, version control, unit testing).

• Work with cloud platforms like AWS

• Ensure data security, governance, and compliance.

• Mentor junior developers and review code for best practices and efficiency