
Data Engineer (m/f/d)
Neuland AI AG, New Bremen, OH, United States
Purpose of the role
To power reliable AI automation at scale, we are looking for a
Data Engineer
who can design and build robust data pipelines for production, distributed data processing systems, and high-quality data foundations for AI platform. You will play a key role in ensuring that AI systems have fast, secure, and structured access to the data they need.
Your mission
Design and build scalable data pipelines for ingesting, transforming, and serving structured and unstructured data
Develop distributed data processing workflows to support AI features such as knowledge retrieval, automation workflows, and analytics
Build and maintain data ingestion systems connecting enterprise APIs, databases, file storage, and streaming sources
Model and optimize datasets for AI applications, including embeddings pipelines and vector indexing workflows
Ensure data reliability, consistency, and observability across pipelines and storage layers
Optimize query performance, data freshness, and cost efficiency in large-scale data systems
Work closely with AI engineers to enable Retrieval-Augmented Generation (RAG) and knowledge-based AI features
Implement batch and real-time processing mechanisms using queues, streaming systems, or event-driven architectures
Design and maintain data storage solutions including relational databases, object storage, and vector databases
Implement data governance practices including access control, lineage tracking, and monitoring
Collaborate with DevOps on containerized deployments, infrastructure automation, and cloud-based data services
Collaborate with our Data Research team
Your profile
Strong experience building production data pipelines
Good understanding of distributed systems and scalable data architectures
Experience working with large datasets and optimizing data performance
Familiarity with modern data tooling and workflow orchestration
Experience enabling data access for AI/ML or analytics use cases
Pragmatic engineering mindset with focus on reliability and maintainability
Ability to collaborate across backend, AI, and infrastructure teams
Curiosity about AI-driven data systems and emerging data infrastructure patterns
Tech Stack & Areas
Tech Stack & Areas
SQL (minimum 5 years)
Hands-on experience with Python
PostgreSQL / analytical databases
Distributed processing concepts
Streaming / messaging systems
Data modeling & pipeline orchestration
Vector databases & embeddings pipelines
Cloud platforms (Azure, GCP, or AWS)
Docker & CI/CD
What we offer
What We Offer
Impact: Build intelligent production systems that redefine how companies use AI
Innovation: Work with cutting-edge frameworks and model ecosystems
Culture: Collaborative, creative, and ownership-driven team
Flexibility: Remote-first and flexible working hours
Growth: Access to AI resources, tools, and training
Participation: Virtual Stock Option Plan (VSOP)
#J-18808-Ljbffr
To power reliable AI automation at scale, we are looking for a
Data Engineer
who can design and build robust data pipelines for production, distributed data processing systems, and high-quality data foundations for AI platform. You will play a key role in ensuring that AI systems have fast, secure, and structured access to the data they need.
Your mission
Design and build scalable data pipelines for ingesting, transforming, and serving structured and unstructured data
Develop distributed data processing workflows to support AI features such as knowledge retrieval, automation workflows, and analytics
Build and maintain data ingestion systems connecting enterprise APIs, databases, file storage, and streaming sources
Model and optimize datasets for AI applications, including embeddings pipelines and vector indexing workflows
Ensure data reliability, consistency, and observability across pipelines and storage layers
Optimize query performance, data freshness, and cost efficiency in large-scale data systems
Work closely with AI engineers to enable Retrieval-Augmented Generation (RAG) and knowledge-based AI features
Implement batch and real-time processing mechanisms using queues, streaming systems, or event-driven architectures
Design and maintain data storage solutions including relational databases, object storage, and vector databases
Implement data governance practices including access control, lineage tracking, and monitoring
Collaborate with DevOps on containerized deployments, infrastructure automation, and cloud-based data services
Collaborate with our Data Research team
Your profile
Strong experience building production data pipelines
Good understanding of distributed systems and scalable data architectures
Experience working with large datasets and optimizing data performance
Familiarity with modern data tooling and workflow orchestration
Experience enabling data access for AI/ML or analytics use cases
Pragmatic engineering mindset with focus on reliability and maintainability
Ability to collaborate across backend, AI, and infrastructure teams
Curiosity about AI-driven data systems and emerging data infrastructure patterns
Tech Stack & Areas
Tech Stack & Areas
SQL (minimum 5 years)
Hands-on experience with Python
PostgreSQL / analytical databases
Distributed processing concepts
Streaming / messaging systems
Data modeling & pipeline orchestration
Vector databases & embeddings pipelines
Cloud platforms (Azure, GCP, or AWS)
Docker & CI/CD
What we offer
What We Offer
Impact: Build intelligent production systems that redefine how companies use AI
Innovation: Work with cutting-edge frameworks and model ecosystems
Culture: Collaborative, creative, and ownership-driven team
Flexibility: Remote-first and flexible working hours
Growth: Access to AI resources, tools, and training
Participation: Virtual Stock Option Plan (VSOP)
#J-18808-Ljbffr