
AI/ML Engineer
3B Staffing LLC, Scottsdale, AZ, United States
CVS
Location: Scottsdale, AZ (Hybrid 3 days a week from day 1)
Length of contract : 1 year (and will be renewed after that)
Rate: $65-67/hr C2C
About the Role
We are seeking an experienced AIML Engineer to design, build, and operate AI/ML infrastructure and agentic systems. This role involves developing MCP servers and agents, integrating LLMs, and implementing RAG pipelines for production environments.
Key Responsibilities
• Design, build and operate MCP servers and MCP agents that host, orchestrate and monitor AI/agent workloads.
• Develop agentic AI, prompt engineering patterns, LLM integrations and developer tooling for production use.
• Own deployment, scaling, reliability and cost-efficiency on Kubernetes/Docker and Google Cloud with automated CI/CD
• Design and implement RAG (Retrieval-Augmented Generation) pipelines and integrations with vector stores and retrieval tooling; use LangChain and Langfuse for orchestration, chaining, and observability.
Core Responsibilities
• Implement and maintain MCP server and agent code, APIs, and SDKs for model access and agent orchestration.
• Design agent behavior, workflows and safety guards for agentic AI systems.
• Create, test and iterate prompt templates, evaluation harnesses and grounding/chain-of-thought strategies.
• Integrate LLMs and model providers (self-hosted and cloud APIs) with unified adapters and telemetry.
• Build developer tooling: CLI, local runner, simulators, and debugging tools for agents and prompts.
• Containerize services (Docker), manage orchestration (Kubernetes/GKE), and optimize nodes, autoscaling and resource requests.
• Ensure observability: logging, metrics, traces, dashboards, alerting and SLOs for model infra and agents.
• Create runbooks, playbooks and incident response procedures; reduce MTTR and perform postmortems.
• Design and maintain RAG workflows: document chunking, embeddings, vector indexing, retrieval strategies, re-ranking and context injection.
• Integrate and instrument LangChain for composable chains, agents and tooling; use Langfuse (or equivalent tracing) to capture prompts, model calls, RAG traces and evaluation telemetry.
Required Skills & Experience
• 5+ years of Strong Software Engineering (Python/NodeJS), system design and production service experience.
• 2+ years of Experience with LLMs, prompt engineering, and agent frameworks.
• 2+ years of Experience Practical experience implementing RAG: embeddings, vector DBs and retrieval tuning.
• 2+ years of Experience with LangChain patterns and with toolchain telemetry (Langfuse or similar) for prompt/model traceability.
• 5+ years of Experience with Kubernetes, Docker, CI/CD and infrastructure-as-code experience.
• 2+ years of Experience with Practical experience with Google Cloud Platform services
• 2+ years of Experience with Observability, testing, and security best practices for distributed systems.
• 2+ years of Experience with evaluating and mitigating retrieval/augmentation failures, hallucinations, and leakage risks in RAG systems.
• Familiarity with vendor and open-source vector stores and embedding providers.
• Familiarity with
CI/CD pipelines
(Jenkins, GitHub Actions, GitLab CI, or ArgoCD).
Location: Scottsdale, AZ (Hybrid 3 days a week from day 1)
Length of contract : 1 year (and will be renewed after that)
Rate: $65-67/hr C2C
About the Role
We are seeking an experienced AIML Engineer to design, build, and operate AI/ML infrastructure and agentic systems. This role involves developing MCP servers and agents, integrating LLMs, and implementing RAG pipelines for production environments.
Key Responsibilities
• Design, build and operate MCP servers and MCP agents that host, orchestrate and monitor AI/agent workloads.
• Develop agentic AI, prompt engineering patterns, LLM integrations and developer tooling for production use.
• Own deployment, scaling, reliability and cost-efficiency on Kubernetes/Docker and Google Cloud with automated CI/CD
• Design and implement RAG (Retrieval-Augmented Generation) pipelines and integrations with vector stores and retrieval tooling; use LangChain and Langfuse for orchestration, chaining, and observability.
Core Responsibilities
• Implement and maintain MCP server and agent code, APIs, and SDKs for model access and agent orchestration.
• Design agent behavior, workflows and safety guards for agentic AI systems.
• Create, test and iterate prompt templates, evaluation harnesses and grounding/chain-of-thought strategies.
• Integrate LLMs and model providers (self-hosted and cloud APIs) with unified adapters and telemetry.
• Build developer tooling: CLI, local runner, simulators, and debugging tools for agents and prompts.
• Containerize services (Docker), manage orchestration (Kubernetes/GKE), and optimize nodes, autoscaling and resource requests.
• Ensure observability: logging, metrics, traces, dashboards, alerting and SLOs for model infra and agents.
• Create runbooks, playbooks and incident response procedures; reduce MTTR and perform postmortems.
• Design and maintain RAG workflows: document chunking, embeddings, vector indexing, retrieval strategies, re-ranking and context injection.
• Integrate and instrument LangChain for composable chains, agents and tooling; use Langfuse (or equivalent tracing) to capture prompts, model calls, RAG traces and evaluation telemetry.
Required Skills & Experience
• 5+ years of Strong Software Engineering (Python/NodeJS), system design and production service experience.
• 2+ years of Experience with LLMs, prompt engineering, and agent frameworks.
• 2+ years of Experience Practical experience implementing RAG: embeddings, vector DBs and retrieval tuning.
• 2+ years of Experience with LangChain patterns and with toolchain telemetry (Langfuse or similar) for prompt/model traceability.
• 5+ years of Experience with Kubernetes, Docker, CI/CD and infrastructure-as-code experience.
• 2+ years of Experience with Practical experience with Google Cloud Platform services
• 2+ years of Experience with Observability, testing, and security best practices for distributed systems.
• 2+ years of Experience with evaluating and mitigating retrieval/augmentation failures, hallucinations, and leakage risks in RAG systems.
• Familiarity with vendor and open-source vector stores and embedding providers.
• Familiarity with
CI/CD pipelines
(Jenkins, GitHub Actions, GitLab CI, or ArgoCD).