Java Developer - Senior

Artech, Alpharetta, GA, United States

Job title: AI/ ML Engineer - Python/Java
Job location: Alpharetta, GA (Hybrid)
Contract: 12 Months
ATTJP00040163

*** On-site position 3-4 days/week in Alpharetta, GA.
*** Hours typically 8:30-4:30, 40 hours/week.

TOP 5 SKILLS REQUIRED:
• 1. 5-7 years of experience in Python with 1-3 years of experience in Java
• 2. Agentic / LLM Engineer experience using LangGraph agentic framework
• 3. RAG / Knowledge graph
• 4. AWS & Kubernetes
• 5. Experience in AI observability platforms (Langsmith / Langfuse)

ADDITIONAL REQUIREMENTS:
Core Software Engineering Skills (Must Have)
- Strong coding in Python (often primary) and/or TypeScript/Java.
- Solid fundamentals: data structures, APIs, concurrency/async, error handling, clean architecture.
- Experience building microservices and integrating with internal/external APIs.
- Familiarity with CI/CD, automated testing, code reviews, and release management.

Agentic / LLM Engineering Skills (Must Have)
- Designing and implementing agent workflows: planning → tool selection → execution → verification.
- Tool/function calling patterns and building reliable tool interfaces (idempotency, retries, timeouts).
- Prompt engineering plus prompt/version management and safe prompt templating.
- Handling non-determinism: evaluations, guardrails, deterministic fallbacks, and replayable runs.
- Building multi-step orchestration (state machines, DAGs, workflow engines, or agent frameworks).

Retrieval + Knowledge Integration (Often Required)
- Building RAG pipelines: chunking, embeddings, indexing, retrieval strategies, reranking.
- Working knowledge of vector databases (e.g., Pinecone, pgvector, FAISS, Weaviate) and search.
- Grounding and citation approaches; freshness and permission-aware retrieval.

Reliability, Safety, and Governance
- Implementing guardrails: PII redaction, prompt injection defenses, policy filters, allow/deny tool lists.
- Designing for observability: tracing agent steps, tool calls, latency, token/cost metrics.
- Building robust fallbacks (rule-based flows, smaller models, cached answers, human escalation).
- Secure handling of secrets and credentials; least-privilege tool access.

Evaluation & Quality (Critical for Agentic Systems)
- Creating evaluation suites: golden tasks, regression sets, scenario tests, adversarial tests.
- Defining success metrics (task completion rate, groundedness, hallucination rate, latency, cost).
- Experience with A/B testing or online evaluation in production.

Platform/Infrastructure (Preferred)
- Cloud experience (AWS/Azure/GCP), containers (Docker), optionally Kubernetes.
- Familiarity with scalable data pipelines and queues (Kafka/SQS/PubSub) for async agent work.
- Experience optimizing inference costs/latency (model choice, batching, caching, token reduction).

Nice-to-Haves
- Experience with frameworks like LangChain, LlamaIndex, or workflow engines (Temporal, Step Functions, Airflow).
- Knowledge of security engineering relevant to LLM apps (prompt injection, data exfiltration patterns).
- Domain expertise in the product area (e.g., telecom operations, network management, customer support).

Soft Skills
- Strong cross-functional collaboration with product, security, and platform teams.
- Ability to translate business workflows into agent capabilities and measurable outcomes.
- Comfortable operating in ambiguity and iterating from prototype → hardened production.

"Artech is an Equal Opportunity Employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, or any other characteristic protected by law. We are committed to fostering a diverse and inclusive workplace where all employees feel valued and respected."