ETL Developer

System Soft Technologies, Bala Cynwyd, PA, United States

ETL Developer

Philadelphia area - 100% onsite - relocation options

US Citizen or Green Card only

We are looking for an experienced ETL Developer to join our technology team at our headquarters just outside of Center City Philadelphia. As a member of our ETL team, you will participate the design and development of our ETL infrastructure. You will work directly with technical and business teams to analyze their data needs and develop solutions for both batch and continuous data movements using enterprise class tools and custom scripting methods.

This role will give you exposure to cutting-edge, distributed database environments that support critical business activities. We are looking for someone who is excited to join our efforts as we evolve, evaluate and promote newer technologies within this space across our firm globally.

In this role, you will:

Design, develop, and optimize

PySpark

data processing workflows for large-scale datasets.
Build and maintain

real-time and batch data pipelines

leveraging

Apache Kafka .
Write clean, efficient, and maintainable

Python code

for data transformation, ETL, and automation.
Develop

shell scripts

and other automation scripts to support data workflows and operational tasks.
Work with

relational databases

(e.g., PostgreSQL, MySQL, SQL Server, Oracle) to write efficient SQL queries, manage schemas, and optimize performance.
Collaborate closely with data engineers, analysts, and platform teams to deliver end-to-end data solutions.
Troubleshoot and improve existing pipelines, ensuring performance, reliability, and scalability.
Follow best practices in version control, CI/CD, documentation, and code reviews.
What we're looking for

Bachelor's degree in Computer Science or a related technical discipline is required
5+ years of hands-on development experience in data engineering or backend engineering roles.
Strong proficiency in:

PySpark

(RDDs, DataFrames, Spark SQL, performance tuning)
Apache Kafka

(producers/consumers, topics, partitions, schema management)
Python

(data manipulation, modular code, error handling)
Scripting

(Bash, Shell, or similar)

Experience working with

relational databases

and writing optimized SQL queries.
Solid understanding of distributed systems and large-scale data processing concepts.
Familiarity with

ETL best practices , data modeling, and pipeline orchestration
Basic understanding of

LLM operations (LLMOps) , including prompt logging, model monitoring, and evaluation
Knowledge of

Spark cluster management , resource optimization, and tuning strategies is preferred.
Experience with workflow orchestration tools such as

Apache Airflow is a plus