DataJobs.io
← Back to all jobs

Job Description

Responsibilities

  • Design, implement, and optimize LLM-driven agents encompassing planning, tool usage, workflow orchestration, and multi-step reasoning
  • Architect memory architectures covering short-term and long-term memory, context handling, and session state
  • Build and refine Retrieval-Augmented Generation pipelines for relevance, grounding, freshness, and retrieval quality
  • Design and operate vector-store infrastructure (examples include pgvector, Milvus, Qdrant, Weaviate)
  • Define evaluation methodologies for agents, prompts, and workflows
  • Enhance end-to-end agent quality, latency, reliability, and operating cost
  • Develop and maintain production inference services that are low-latency, high-concurrency, and dependable
  • Support online-learning models such as contextual bandits and reinforcement learning policies with real-time inference and online parameter updates
  • Deploy and optimize AI inference systems for latency, throughput, reliability, and resource efficiency
  • Analyze and address bottlenecks in inference serving
  • Assist deployment and serving of recommendation, ranking, and reinforcement learning models created by research scientists
  • Apply lightweight adaptation techniques (such as LoRA, QLoRA, PEFT) when appropriate for domain needs
  • Build and maintain deployment pipelines, observability systems, and tracing for agents and serving endpoints
  • Monitor quality regressions, performance degradation, and model drift
  • Maintain version control for models, prompts, datasets, and agent configurations
  • Contribute to automated validation, testing, and CI/CD workflows for AI systems
  • Collaborate with research scientists, backend engineers, and data scientists to integrate AI into production products
  • Document systems, best practices, and internal tooling
  • Contribute to engineering standards and operational excellence across AI initiatives

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Machine Learning, or a related field
  • 3+ years of industry experience in Machine Learning Engineering or related roles
  • Strong software and systems engineering background with experience delivering low-latency, reliable production services in languages such as Go, Rust, C++, or equivalent
  • Hands-on experience building or supporting real-time inference systems for recommendations, ranking, contextual bandits, reinforcement learning, or similar adaptive ML applications
  • Proficiency with PyTorch and the Hugging Face ecosystem
  • Experience creating production LLM or agent applications using frameworks such as LangGraph or LlamaIndex
  • Practical experience with RAG systems, embeddings, and vector databases
  • Experience evaluating and monitoring LLM or agent systems in production
  • Experience deploying and optimizing production machine learning or LLM systems
  • Understanding of inference runtime behavior, resource usage, latency optimization, and production serving performance
  • Experience with Docker and Kubernetes
  • Experience with cloud platforms such as AWS, GCP, or Azure
  • Fluent Mandarin Chinese

Technologies

  • Go, Rust, C++
  • PyTorch, Hugging Face
  • LangGraph, LlamaIndex
  • pgvector, Milvus, Qdrant, Weaviate
  • Docker, Kubernetes
  • AWS, GCP, Azure
  • LoRA, QLoRA, PEFT
  • CUDA, OpenAI Triton, TFLite, CoreML
  • FSDP, DeepSpeed, Spark, Hadoop

Benefits

  • 401(k)
  • 401(k) matching
  • Dental insurance
  • Health insurance
  • Life insurance
  • Paid time off
  • Parental leave
  • Retirement plan
  • Vision insurance

Pay

From $130,000 per year

Location and Onsite Details

Onsite in Irvine, California 92618

Relocation

Relocation to Irvine, CA 92618 before starting is required

Work Arrangement

In person

Similar Jobs

Get Job Alerts

New jobs delivered to your inbox.