Data Engineer (m/f/d)

Neuland AI AG, New Bremen, OH, United States

Purpose of the role
To power reliable AI automation at scale, we are looking for a

Data Engineer

who can design and build robust data pipelines for production, distributed data processing systems, and high-quality data foundations for AI platform. You will play a key role in ensuring that AI systems have fast, secure, and structured access to the data they need.

Your mission

Design and build scalable data pipelines for ingesting, transforming, and serving structured and unstructured data

Develop distributed data processing workflows to support AI features such as knowledge retrieval, automation workflows, and analytics

Build and maintain data ingestion systems connecting enterprise APIs, databases, file storage, and streaming sources

Model and optimize datasets for AI applications, including embeddings pipelines and vector indexing workflows

Ensure data reliability, consistency, and observability across pipelines and storage layers

Optimize query performance, data freshness, and cost efficiency in large-scale data systems

Work closely with AI engineers to enable Retrieval-Augmented Generation (RAG) and knowledge-based AI features

Implement batch and real-time processing mechanisms using queues, streaming systems, or event-driven architectures

Design and maintain data storage solutions including relational databases, object storage, and vector databases

Implement data governance practices including access control, lineage tracking, and monitoring

Collaborate with DevOps on containerized deployments, infrastructure automation, and cloud-based data services

Collaborate with our Data Research team

Your profile

Strong experience building production data pipelines

Good understanding of distributed systems and scalable data architectures

Experience working with large datasets and optimizing data performance

Familiarity with modern data tooling and workflow orchestration

Experience enabling data access for AI/ML or analytics use cases

Pragmatic engineering mindset with focus on reliability and maintainability

Ability to collaborate across backend, AI, and infrastructure teams

Curiosity about AI-driven data systems and emerging data infrastructure patterns

Tech Stack & Areas
Tech Stack & Areas

SQL (minimum 5 years)

Hands-on experience with Python

PostgreSQL / analytical databases

Distributed processing concepts

Streaming / messaging systems

Data modeling & pipeline orchestration

Vector databases & embeddings pipelines

Cloud platforms (Azure, GCP, or AWS)

Docker & CI/CD

What we offer

What We Offer

Impact: Build intelligent production systems that redefine how companies use AI

Innovation: Work with cutting-edge frameworks and model ecosystems

Culture: Collaborative, creative, and ownership-driven team

Flexibility: Remote-first and flexible working hours

Growth: Access to AI resources, tools, and training

Participation: Virtual Stock Option Plan (VSOP)

#J-18808-Ljbffr