
Software Developer/Engineer (Mid Level experience)
Diversity Nexus, Phila, PA, United States
Software Developer/Engineer (Mid Level experience)
Position Type: Contract Location: Philadelphia | Work Mode: Hybrid, minimum 3 days in the office
Interview Schedule: 1 st interview, 1-hour, in-person; 2 nd interview, 1-hour, in-person
Consultant Requirements - On-Prem LLM & Vector DB Implementation
Core Experience Hands-on experience deploying
open-source LLMs
such as
Meta Llama 3
and
Mistral / Mixtral
in on-prem or private environments Strong proficiency in
Python
for LLM inference, prompt engineering, and integration Experience with
CPU-based inference
, model quantization, and performance tuning Vector Databases & RAG
Practical experience with
open-source vector databases
such as
Qdrant
,
Chroma
,
Milvus
, or
pgvector Proven implementation of
Retrieval-Augmented Generation (RAG)
pipelines Experience generating and managing
embeddings
and metadata filtering Security & Governance
Understanding of
data privacy
, air-gapped deployments, and enterprise security requirements Experience implementing access controls and audit logging Nice to Have
Experience with
LangChain
or
LlamaIndex Exposure to
Rust, Go, or C++
for high-performance services Familiarity with
Docker
and
Kubernetes
for on-prem deployments Knowledge of inference frameworks (e.g.,
vLLM
,
llama.cpp
,
Hugging Face Transformers
) Prior work in regulated or enterprise environments Deliverables
Reference architecture and deployment guidance Working prototype (LLM + vector DB + RAG) Documentation and knowledge transfer to internal teams
Position Type: Contract Location: Philadelphia | Work Mode: Hybrid, minimum 3 days in the office
Interview Schedule: 1 st interview, 1-hour, in-person; 2 nd interview, 1-hour, in-person
Consultant Requirements - On-Prem LLM & Vector DB Implementation
Core Experience Hands-on experience deploying
open-source LLMs
such as
Meta Llama 3
and
Mistral / Mixtral
in on-prem or private environments Strong proficiency in
Python
for LLM inference, prompt engineering, and integration Experience with
CPU-based inference
, model quantization, and performance tuning Vector Databases & RAG
Practical experience with
open-source vector databases
such as
Qdrant
,
Chroma
,
Milvus
, or
pgvector Proven implementation of
Retrieval-Augmented Generation (RAG)
pipelines Experience generating and managing
embeddings
and metadata filtering Security & Governance
Understanding of
data privacy
, air-gapped deployments, and enterprise security requirements Experience implementing access controls and audit logging Nice to Have
Experience with
LangChain
or
LlamaIndex Exposure to
Rust, Go, or C++
for high-performance services Familiarity with
Docker
and
Kubernetes
for on-prem deployments Knowledge of inference frameworks (e.g.,
vLLM
,
llama.cpp
,
Hugging Face Transformers
) Prior work in regulated or enterprise environments Deliverables
Reference architecture and deployment guidance Working prototype (LLM + vector DB + RAG) Documentation and knowledge transfer to internal teams