Machine Learning Engineer (Agent & Inference) - (Chinese Mandarin Speaker)
Job Description
Responsibilities
- Design, implement, and optimize LLM-driven agents encompassing planning, tool usage, workflow orchestration, and multi-step reasoning
- Architect memory architectures covering short-term and long-term memory, context handling, and session state
- Build and refine Retrieval-Augmented Generation pipelines for relevance, grounding, freshness, and retrieval quality
- Design and operate vector-store infrastructure (examples include pgvector, Milvus, Qdrant, Weaviate)
- Define evaluation methodologies for agents, prompts, and workflows
- Enhance end-to-end agent quality, latency, reliability, and operating cost
- Develop and maintain production inference services that are low-latency, high-concurrency, and dependable
- Support online-learning models such as contextual bandits and reinforcement learning policies with real-time inference and online parameter updates
- Deploy and optimize AI inference systems for latency, throughput, reliability, and resource efficiency
- Analyze and address bottlenecks in inference serving
- Assist deployment and serving of recommendation, ranking, and reinforcement learning models created by research scientists
- Apply lightweight adaptation techniques (such as LoRA, QLoRA, PEFT) when appropriate for domain needs
- Build and maintain deployment pipelines, observability systems, and tracing for agents and serving endpoints
- Monitor quality regressions, performance degradation, and model drift
- Maintain version control for models, prompts, datasets, and agent configurations
- Contribute to automated validation, testing, and CI/CD workflows for AI systems
- Collaborate with research scientists, backend engineers, and data scientists to integrate AI into production products
- Document systems, best practices, and internal tooling
- Contribute to engineering standards and operational excellence across AI initiatives
Requirements
- Bachelor’s or Master’s degree in Computer Science, Machine Learning, or a related field
- 3+ years of industry experience in Machine Learning Engineering or related roles
- Strong software and systems engineering background with experience delivering low-latency, reliable production services in languages such as Go, Rust, C++, or equivalent
- Hands-on experience building or supporting real-time inference systems for recommendations, ranking, contextual bandits, reinforcement learning, or similar adaptive ML applications
- Proficiency with PyTorch and the Hugging Face ecosystem
- Experience creating production LLM or agent applications using frameworks such as LangGraph or LlamaIndex
- Practical experience with RAG systems, embeddings, and vector databases
- Experience evaluating and monitoring LLM or agent systems in production
- Experience deploying and optimizing production machine learning or LLM systems
- Understanding of inference runtime behavior, resource usage, latency optimization, and production serving performance
- Experience with Docker and Kubernetes
- Experience with cloud platforms such as AWS, GCP, or Azure
- Fluent Mandarin Chinese
Technologies
- Go, Rust, C++
- PyTorch, Hugging Face
- LangGraph, LlamaIndex
- pgvector, Milvus, Qdrant, Weaviate
- Docker, Kubernetes
- AWS, GCP, Azure
- LoRA, QLoRA, PEFT
- CUDA, OpenAI Triton, TFLite, CoreML
- FSDP, DeepSpeed, Spark, Hadoop
Benefits
- 401(k)
- 401(k) matching
- Dental insurance
- Health insurance
- Life insurance
- Paid time off
- Parental leave
- Retirement plan
- Vision insurance
Pay
From $130,000 per year
Location and Onsite Details
Onsite in Irvine, California 92618
Relocation
Relocation to Irvine, CA 92618 before starting is required
Work Arrangement
In person