
ETL Developer
System Soft Technologies, Bala Cynwyd, PA, United States
ETL Developer
Philadelphia area - 100% onsite - relocation options
US Citizen or Green Card only
We are looking for an experienced ETL Developer to join our technology team at our headquarters just outside of Center City Philadelphia. As a member of our ETL team, you will participate the design and development of our ETL infrastructure. You will work directly with technical and business teams to analyze their data needs and develop solutions for both batch and continuous data movements using enterprise class tools and custom scripting methods.
This role will give you exposure to cutting-edge, distributed database environments that support critical business activities. We are looking for someone who is excited to join our efforts as we evolve, evaluate and promote newer technologies within this space across our firm globally.
In this role, you will:
Design, develop, and optimize
PySpark
data processing workflows for large-scale datasets.
Build and maintain
real-time and batch data pipelines
leveraging
Apache Kafka .
Write clean, efficient, and maintainable
Python code
for data transformation, ETL, and automation.
Develop
shell scripts
and other automation scripts to support data workflows and operational tasks.
Work with
relational databases
(e.g., PostgreSQL, MySQL, SQL Server, Oracle) to write efficient SQL queries, manage schemas, and optimize performance.
Collaborate closely with data engineers, analysts, and platform teams to deliver end-to-end data solutions.
Troubleshoot and improve existing pipelines, ensuring performance, reliability, and scalability.
Follow best practices in version control, CI/CD, documentation, and code reviews.
What we're looking for
Bachelor's degree in Computer Science or a related technical discipline is required
5+ years of hands-on development experience in data engineering or backend engineering roles.
Strong proficiency in:
PySpark
(RDDs, DataFrames, Spark SQL, performance tuning)
Apache Kafka
(producers/consumers, topics, partitions, schema management)
Python
(data manipulation, modular code, error handling)
Scripting
(Bash, Shell, or similar)
Experience working with
relational databases
and writing optimized SQL queries.
Solid understanding of distributed systems and large-scale data processing concepts.
Familiarity with
ETL best practices , data modeling, and pipeline orchestration
Basic understanding of
LLM operations (LLMOps) , including prompt logging, model monitoring, and evaluation
Knowledge of
Spark cluster management , resource optimization, and tuning strategies is preferred.
Experience with workflow orchestration tools such as
Apache Airflow is a plus
Philadelphia area - 100% onsite - relocation options
US Citizen or Green Card only
We are looking for an experienced ETL Developer to join our technology team at our headquarters just outside of Center City Philadelphia. As a member of our ETL team, you will participate the design and development of our ETL infrastructure. You will work directly with technical and business teams to analyze their data needs and develop solutions for both batch and continuous data movements using enterprise class tools and custom scripting methods.
This role will give you exposure to cutting-edge, distributed database environments that support critical business activities. We are looking for someone who is excited to join our efforts as we evolve, evaluate and promote newer technologies within this space across our firm globally.
In this role, you will:
Design, develop, and optimize
PySpark
data processing workflows for large-scale datasets.
Build and maintain
real-time and batch data pipelines
leveraging
Apache Kafka .
Write clean, efficient, and maintainable
Python code
for data transformation, ETL, and automation.
Develop
shell scripts
and other automation scripts to support data workflows and operational tasks.
Work with
relational databases
(e.g., PostgreSQL, MySQL, SQL Server, Oracle) to write efficient SQL queries, manage schemas, and optimize performance.
Collaborate closely with data engineers, analysts, and platform teams to deliver end-to-end data solutions.
Troubleshoot and improve existing pipelines, ensuring performance, reliability, and scalability.
Follow best practices in version control, CI/CD, documentation, and code reviews.
What we're looking for
Bachelor's degree in Computer Science or a related technical discipline is required
5+ years of hands-on development experience in data engineering or backend engineering roles.
Strong proficiency in:
PySpark
(RDDs, DataFrames, Spark SQL, performance tuning)
Apache Kafka
(producers/consumers, topics, partitions, schema management)
Python
(data manipulation, modular code, error handling)
Scripting
(Bash, Shell, or similar)
Experience working with
relational databases
and writing optimized SQL queries.
Solid understanding of distributed systems and large-scale data processing concepts.
Familiarity with
ETL best practices , data modeling, and pipeline orchestration
Basic understanding of
LLM operations (LLMOps) , including prompt logging, model monitoring, and evaluation
Knowledge of
Spark cluster management , resource optimization, and tuning strategies is preferred.
Experience with workflow orchestration tools such as
Apache Airflow is a plus