Responsibilities

Design, implement, and optimize LLM-driven agents encompassing planning, tool usage, workflow orchestration, and multi-step reasoning
Architect memory architectures covering short-term and long-term memory, context handling, and session state
Build and refine Retrieval-Augmented Generation pipelines for relevance, grounding, freshness, and retrieval quality
Design and operate vector-store infrastructure (examples include pgvector, Milvus, Qdrant, Weaviate)
Define evaluation methodologies for agents, prompts, and workflows
Enhance end-to-end agent quality, latency, reliability, and operating cost
Develop and maintain production inference services that are low-latency, high-concurrency, and dependable
Support online-learning models such as contextual bandits and reinforcement learning policies with real-time inference and online parameter updates
Deploy and optimize AI inference systems for latency, throughput, reliability, and resource efficiency
Analyze and address bottlenecks in inference serving
Assist deployment and serving of recommendation, ranking, and reinforcement learning models created by research scientists
Apply lightweight adaptation techniques (such as LoRA, QLoRA, PEFT) when appropriate for domain needs
Build and maintain deployment pipelines, observability systems, and tracing for agents and serving endpoints
Monitor quality regressions, performance degradation, and model drift
Maintain version control for models, prompts, datasets, and agent configurations
Contribute to automated validation, testing, and CI/CD workflows for AI systems
Collaborate with research scientists, backend engineers, and data scientists to integrate AI into production products
Document systems, best practices, and internal tooling
Contribute to engineering standards and operational excellence across AI initiatives

Requirements

Bachelor’s or Master’s degree in Computer Science, Machine Learning, or a related field
3+ years of industry experience in Machine Learning Engineering or related roles
Strong software and systems engineering background with experience delivering low-latency, reliable production services in languages such as Go, Rust, C++, or equivalent
Hands-on experience building or supporting real-time inference systems for recommendations, ranking, contextual bandits, reinforcement learning, or similar adaptive ML applications
Proficiency with PyTorch and the Hugging Face ecosystem
Experience creating production LLM or agent applications using frameworks such as LangGraph or LlamaIndex
Practical experience with RAG systems, embeddings, and vector databases
Experience evaluating and monitoring LLM or agent systems in production
Experience deploying and optimizing production machine learning or LLM systems
Understanding of inference runtime behavior, resource usage, latency optimization, and production serving performance
Experience with Docker and Kubernetes
Experience with cloud platforms such as AWS, GCP, or Azure
Fluent Mandarin Chinese

Technologies

Go, Rust, C++
PyTorch, Hugging Face
LangGraph, LlamaIndex
pgvector, Milvus, Qdrant, Weaviate
Docker, Kubernetes
AWS, GCP, Azure
LoRA, QLoRA, PEFT
CUDA, OpenAI Triton, TFLite, CoreML
FSDP, DeepSpeed, Spark, Hadoop

Benefits

401(k)
401(k) matching
Dental insurance
Health insurance
Life insurance
Paid time off
Parental leave
Retirement plan
Vision insurance

Pay

From $130,000 per year

Location and Onsite Details

Onsite in Irvine, California 92618

Relocation

Relocation to Irvine, CA 92618 before starting is required

Work Arrangement

In person

Machine Learning Engineer (Agent & Inference) - (Chinese Mandarin Speaker)

Job Description

Responsibilities

Requirements

Technologies

Benefits

Pay

Location and Onsite Details

Relocation

Work Arrangement

Similar Jobs

Principal Machine Learning Engineer, Agentic AI

Machine Learning Engineer

Senior Machine Learning Engineer (CV/NLP/Multimodal/LLM/Agent)-E-Commerce Government

Principal Machine Learning Engineer

Data Analyst — Game Analytics (Chinese Mandarin Speaker)

Principal Machine Learning Engineer

Get Job Alerts