
Collibra Data Lineage Automation Engineer
3B Staffing LLC, Mc Lean, VA, United States
Collibra Data Lineage Automation Engineer
6- Months Contract
Location: Role is on-site 5 days/week in McLean, VA.
Max rate = $68/hr. on C2C
We are seeking a highly experienced Data Lineage Automation Engineer to lead the design and implementation of automated end-to-end lineage solutions across a highly heterogeneous enterprise data ecosystem. This role requires deep technical expertise in lineage frameworks (such as Spline and OpenLineage), experience across cloud and legacy environments, and a strong AI foundation to support intelligent metadata extraction and traceability.
Key Responsibilities
• Lead the implementation of automated data lineage across a complex data estate that includes: o Cloud platforms (e.g., Snowflake, AWS) o Legacy relational databases and ETLs o NoSQL data stores o BI/reporting platforms (e.g., Tableau, Power BI)
• Implement or extend frameworks such as Spline, OpenLineage, or similar open frameworks to support active lineage capture
• Build connectors, extractors, or agents where necessary to bridge gaps between systems and lineage frameworks
• Integrate with metadata platforms (e.g., Collibra) to publish lineage in a consumable format
• Apply AI/ML techniques to infer lineage where automation is incomplete (e.g., handling Java based ETLs), using logs, query patterns, or usage metadata
• Develop reusable lineage components for operational reuse across domains
• Guide stakeholders on best practices for lineage standardization, storage, and use
Required Skills & Experience
• Proven experience delivering automated data lineage solutions across hybrid architectures
• Hands-on expertise with Spline, OpenLineage, Marquez, or comparable lineage frameworks
• Deep understanding of metadata capture, ETL process tracing, and query execution mapping
• Strong AI/ML background - particularly in metadata intelligence, natural language processing for code parsing, or pattern detection
• Experience integrating lineage with data governance tools (e.g., Collibra, Alation, etc.)
• Strong programming background in Python, Scala, or Java
• Deep familiarity with SQL and query logs from systems like Snowflake, SQL Server, Oracle, MongoDB, etc.
Big Plus Skills
• Experience with third-party commercial data lineage solutions a plus (evaluations and implementations)
• Prior work in regulated environments (e.g., financial services, healthcare)
• Familiarity with event-based architectures for real-time lineage propagation
• Knowledge of data mesh or domain-driven lineage strategies
Ideal Candidate
• Has successfully implemented automated lineage at enterprise scale
• Operates at the intersection of data engineering, metadata management, and AI
• Can act as a technical thought partner to architecture teams and governance leads
• Brings the mindset of automation-first and reuse-oriented design
6- Months Contract
Location: Role is on-site 5 days/week in McLean, VA.
Max rate = $68/hr. on C2C
We are seeking a highly experienced Data Lineage Automation Engineer to lead the design and implementation of automated end-to-end lineage solutions across a highly heterogeneous enterprise data ecosystem. This role requires deep technical expertise in lineage frameworks (such as Spline and OpenLineage), experience across cloud and legacy environments, and a strong AI foundation to support intelligent metadata extraction and traceability.
Key Responsibilities
• Lead the implementation of automated data lineage across a complex data estate that includes: o Cloud platforms (e.g., Snowflake, AWS) o Legacy relational databases and ETLs o NoSQL data stores o BI/reporting platforms (e.g., Tableau, Power BI)
• Implement or extend frameworks such as Spline, OpenLineage, or similar open frameworks to support active lineage capture
• Build connectors, extractors, or agents where necessary to bridge gaps between systems and lineage frameworks
• Integrate with metadata platforms (e.g., Collibra) to publish lineage in a consumable format
• Apply AI/ML techniques to infer lineage where automation is incomplete (e.g., handling Java based ETLs), using logs, query patterns, or usage metadata
• Develop reusable lineage components for operational reuse across domains
• Guide stakeholders on best practices for lineage standardization, storage, and use
Required Skills & Experience
• Proven experience delivering automated data lineage solutions across hybrid architectures
• Hands-on expertise with Spline, OpenLineage, Marquez, or comparable lineage frameworks
• Deep understanding of metadata capture, ETL process tracing, and query execution mapping
• Strong AI/ML background - particularly in metadata intelligence, natural language processing for code parsing, or pattern detection
• Experience integrating lineage with data governance tools (e.g., Collibra, Alation, etc.)
• Strong programming background in Python, Scala, or Java
• Deep familiarity with SQL and query logs from systems like Snowflake, SQL Server, Oracle, MongoDB, etc.
Big Plus Skills
• Experience with third-party commercial data lineage solutions a plus (evaluations and implementations)
• Prior work in regulated environments (e.g., financial services, healthcare)
• Familiarity with event-based architectures for real-time lineage propagation
• Knowledge of data mesh or domain-driven lineage strategies
Ideal Candidate
• Has successfully implemented automated lineage at enterprise scale
• Operates at the intersection of data engineering, metadata management, and AI
• Can act as a technical thought partner to architecture teams and governance leads
• Brings the mindset of automation-first and reuse-oriented design