Capital One is looking for a Lead Machine Learning Engineer to spearhead MLOps initiatives, oversee KServe powered pipelines, and design scalable Kubernetes-based environments for PyTorch and TensorFlow workloads on AWS. The role sits at the intersection of advanced modeling and production engineering, collaborating closely with Product and Data Science teams to translate analytics into reliable, scalable solutions in New York City.

Responsibilities

Design, build, and deliver ML models and components that address real business needs, partnering with Product and Data Science colleagues.
Guide ML infrastructure decisions by applying modeling insights, including model selection, data and feature choices, training workflows, hyperparameter tuning, dimensionality, bias/variance considerations, and validation.
Tackle complex problems through coding, model development, validation, and automation of tests and deployment processes.
Work within a cross-functional Agile team to create and enhance software for cutting-edge big data and ML applications.
Retrain, maintain, and monitor models in production to ensure ongoing performance.
Use or build cloud-native architectures and platforms to deliver optimized ML models at scale.
Construct efficient data pipelines that feed ML models with quality data.
Apply continuous integration and continuous deployment best practices, including test automation and monitoring, to ensure successful deployment of models and code.
Ensure code quality and governance, and adhere to Responsible and Explainable AI practices.
Program primarily in Python, Scala, or Java.

Requirements

Bachelor’s degree
At least 6 years of experience designing and building data-intensive solutions using distributed computing (Internship experience does not apply)
At least 4 years of experience programming with Python, Scala, or Java
At least 2 years of experience building, scaling, and optimizing ML systems

Technologies

Python
Scala
Java
PyTorch
TensorFlow
scikit-learn
Dask
Spark
Kubernetes
KServe
AWS
Azure
Google Cloud Platform

Benefits

Health benefits
Financial benefits
Incentives (cash bonuses and/or long term incentives)

Basic Qualifications

Bachelor’s degree
At least 6 years of experience designing and building data-intensive solutions using distributed computing (Internship experience does not apply)
At least 4 years of experience programming with Python, Scala, or Java
At least 2 years of experience building, scaling, and optimizing ML systems

Preferred Qualifications

Master’s or doctoral degree in computer science, electrical engineering, mathematics, or a related field
3+ years of experience building production-ready data pipelines that feed ML models
3+ years of on-the-job experience with an industry-recognized ML framework such as scikit-learn, PyTorch, Dask, Spark, or TensorFlow
2+ years of experience developing performant, resilient, and maintainable code
2+ years of experience gathering and preparing data for ML models
2+ years of people leadership experience
1+ years leading teams developing ML solutions using industry best practices, patterns, and automation
Experience deploying ML solutions in public cloud environments like AWS, Azure, or Google Cloud
Experience designing and scaling complex data pipelines for ML models and evaluating their performance
Notable ML industry impact through conference talks, papers, open source contributions, or patents

Lead Machine Learning Engineer (MLOps, KServe + building Kubernetes Clusters, PyTorch, TensorFlow on AWS)

Job Description

Responsibilities

Requirements

Technologies

Benefits

Basic Qualifications

Preferred Qualifications

Similar Jobs

Lead Machine Learning Engineer

Sr. Lead Machine Learning Engineer

Lead Machine Learning Engineer

Lead Machine Learning Engineer (Manager IC)

Lead Machine Learning Engineer (Manager IC)

Senior Lead Machine Learning Engineer (Intelligent Foundations and Experiences)