Senior Lead Machine Learning Engineer
Job Description
A Senior Lead Machine Learning Engineer role at Capital One in Richmond, VA (onsite), focusing on productionizing machine learning applications at scale through architecture, design, and deployment within Agile teams.
Responsibilities
- Design and deploy machine learning models and components to address real business needs, collaborating with Product and Data Science teams.
- Make informed ML infrastructure decisions based on modeling techniques, including model type, data and feature selection, training, hyperparameter tuning, dimensionality, bias/variance, and validation.
- Tackle complex problems by writing and testing production code, building and validating models, and automating tests and deployment.
- Collaborate in a cross functional Agile team to build and improve software powering advanced big data and ML workloads.
- Retrain, maintain, and monitor models in production to sustain performance.
- Leverage or develop cloud based architectures and platforms to deliver scalable ML models.
- Build efficient data pipelines to feed ML models.
- Apply CI/CD best practices, including test automation and monitoring, to enable reliable deployment of ML models and code.
- Maintain secure, well governed code and models, following Responsible and Explainable AI practices.
- Proficiency with Python, Scala, or Java for implementation.
Requirements
- Bachelor’s Degree.
- 8+ years designing and building data-intensive solutions using distributed computing (internship experience not counted).
- 4+ years programming in Python, Scala, or Java.
- 3+ years building, scaling, and optimizing ML systems.
- 2+ years leading teams delivering ML solutions.
Technologies
- Python
- Scala
- Java
- scikit-learn
- PyTorch
- Dask
- Spark
- TensorFlow
- AWS
- Azure
- Google Cloud Platform
Benefits
- Health benefits
- Financial benefits
- Performance-based incentives including cash bonuses and long term incentives
Preferred Qualifications
- Master's or doctoral degree in computer science, electrical engineering, mathematics, or a related field.
- Experience developing and deploying ML solutions on public clouds such as AWS, Azure, or Google Cloud.
- 4+ years of hands on experience with industry standard ML frameworks (scikit-learn, PyTorch, Dask, Spark, or TensorFlow).
- 3+ years writing performant, resilient, and maintainable code.
- 3+ years performing data gathering and preparation for ML models.
- 3+ years of people management experience.
- Contributions to the ML field through conference talks, papers, blogs, open source, or patents.
- 3+ years building production ready data pipelines that feed ML models.
- Ability to clearly communicate complex technical concepts to diverse audiences.