Sr. Lead, Machine Learning Engineer (Enterprise Platforms Technology)
Job Description
Capital One's Enterprise Platforms Technology (EPTech) team offers a path to productionize machine learning at scale, with a focus on architectural design, development, and governance of ML models and infrastructure. This onsite role in McLean, VA comes with a competitive annual compensation range of USD 229,900 to 262,400, plus a benefits package designed to support health, financial security, and total well-being.
Responsibilities
- Design, build, and deliver ML models and components that address real business needs, partnering with Product managers and Data Science teams.
- Inform ML infrastructure decisions with a solid understanding of modeling techniques, including model choice, data and feature selection, training, hyperparameter tuning, dimensionality, bias vs. variance, and validation.
- Solve complex problems by writing and validating application code, developing ML models, and automating tests and deployment processes.
- Collaborate within a cross functional Agile team to create and enhance software that enables state of the art big data and ML applications.
- Retrain, maintain, and monitor models once they are in production.
- Leverage or build cloud-based architectures, technologies, and platforms to deliver optimized ML models at scale.
- Construct optimized data pipelines to feed ML models.
- Adopt continuous integration and continuous deployment practices, including test automation and monitoring, to ensure successful deployment of ML models and application code.
- Ensure code quality and governance to reduce vulnerabilities, maintain risk-aware practices for models, and follow Responsible and Explainable AI principles.
- Work with Python, Scala, or Java to implement solutions.
Requirements
- Bachelor’s degree; Master’s or doctoral degree in computer science, electrical engineering, mathematics, or a related field.
- At least 8 years of experience designing and building data-intensive solutions using distributed computing (internship experience does not apply).
- At least 4 years of experience programming with Python, Scala, or Java.
- At least 3 years of experience building, scaling, and optimizing ML systems.
- At least 2 years of experience leading teams developing ML solutions.
- Experience developing and deploying ML solutions in a public cloud such as AWS, Azure, or Google Cloud Platform.
- 4+ years of on-the-job experience with an industry recognized ML framework such as scikit-learn, PyTorch, Dask, Spark, or TensorFlow.
- 3+ years of experience developing performant, resilient, and maintainable code.
- 3+ years of experience with data gathering and preparation for ML models.
- 3+ years of people management experience.
- ML industry impact through conference presentations, papers, blog posts, open source contributions, or patents.
- 3+ years of experience building production-ready data pipelines that feed ML models.
- Ability to communicate complex technical concepts clearly to a variety of audiences.
Technologies
- Python, Scala, Java
- scikit-learn, PyTorch, Dask, Spark, TensorFlow
- AWS, Azure, Google Cloud Platform
Benefits
- Health benefits
- Financial benefits
- Incentives (cash bonus and/or long-term incentives)
- Other benefits that support total well-being