Staff Machine Learning Engineer
Job Description
Staff Machine Learning Engineer at Home Depot / THD in Atlanta, GA (remote); salary $120,000–$190,000 per year; minimum 3 years of experience; High School Diploma or GED.
Responsibilities
- Partner with cross functional teams including UX, engineering, and product management to design secure, reliable, and scalable ML solutions
- Ensure user stories are developer-ready, clear, and testable in collaboration with the Product Team
- Tune off the shelf software to meet evolving business needs
- Build dashboards, logging, alerts, and response workflows to detect and address issues proactively
- Engage in learning activities on modern software design, machine learning, and core development practices via communities of practice
- Regularly review external articles, tutorials, and videos to stay current with new technologies and best practices
- Attend conferences to evaluate and adopt new innovations when appropriate
- Analyze business trends and behavioral data to identify opportunities for improvement and new initiatives
- Lead evaluation, development, and recommendations of technology products and platforms to deliver cost effective solutions
- Research and design suitable infrastructure, networking, databases, security, and ML architectures for products
- Create and maintain monitoring and support tools
- Contribute to project planning and manage multiple initiatives
- Develop formal training programs
- Answer questions from other product or support teams
- Monitor tools and foster collaboration across product teams
- Provide production support for deployed software
- Track production service level objectives and product performance
- Periodically assess production performance and capacity across code, infrastructure, data, messaging, and model quality
Requirements
- Must be eighteen years of age or older
- Must be legally permitted to work in the United States
Minimum Qualifications
- High School Diploma or GED
- At least 3 years of relevant work experience
Preferred Qualifications
- 3 to 6 years of relevant work experience
- Proven experience designing, training, evaluating, and deploying ML models in production, including batch and real time inference
- Hands on ML lifecycle management experience covering feature engineering, model versioning, experimentation, validation, and monitoring for data drift and performance degradation
- Experience building and operating ML pipelines with cloud native services, data platforms, and CI/CD practices for reproducible deployments
- Solid understanding of applied statistics, model evaluation metrics, and tradeoffs among accuracy, interpretability, latency, and operational cost
- Experience with clustering, forecasting, anomaly detection, and neural networks
- Foundation in basic statistics and regression algorithms
- Exposure to advanced ML techniques such as NLP, CNNs, autoencoders, and embeddings
- Experience training ML models on very large datasets
- Proficiency with data analysis and ML tools like Jupyter, Pandas, SciPy, Scikit-learn, Gensim, TensorFlow, and PyTorch, and integrating them into scalable systems
- Experience with Google Cloud Platform and AI/ML components such as Vertex AI, BigQueryML, and AutoML
- Familiarity with data engineering practices and big data platforms like BigQuery and DataStore
- Proficiency in a modern scripting language, preferably Python
- Experience writing SQL queries against relational databases
- Experience with version control systems, preferably Git
- Experience in Linux or Unix environments
- Experience with CI/CD toolchains
- Experience with REST and well designed web services
- Experience designing production systems with High Availability, Disaster Recovery, performance, efficiency, and security considerations
- Experience with cloud computing platforms and automation patterns for ML services
- Familiarity with defensive coding patterns for high availability
- Experience with A/B testing and scalable REST based web services design
- Familiarity with advanced ML architectures such as GANs, GRU, LSTM, RNNs, CNNs, and style transfer
Technologies
- Python
- SQL
- Git
- Linux
- Unix
- Google Cloud Platform
- Vertex AI
- BigQuery
- BigQueryML
- AutoML
- Jupyter Notebooks
- Pandas
- SciPy
- Scikit-learn
- Gensim
- TensorFlow
- PyTorch
- REST
- CI/CD
- Datastore
Reporting and Team Structure
- Typically reports to Software Engineer Manager or Senior Software Engineer Manager
- 0 direct reports
Travel Requirements
- Typically requires overnight travel 5% to 20% of the time
Physical Requirements
- Most time spent sitting in a comfortable position, with occasional light movement or lifting
Working Conditions
- Indoor office environment with generally comfortable conditions; unpleasant conditions are infrequent