Sr Machine Learning Engineer
Job Description
Location: Lake Buena Vista, FL (onsite). Salary: USD 155,700 - 208,700 per year.
Join The Walt Disney Company as a Senior Machine Learning Engineer in a role focused on designing, deploying, and operating ML driven self-healing automation across enterprise systems. You will build reusable ML frameworks and deliver AI powered insights to improve reliability and observability. This position sits within Disney’s Enterprise Technology ecosystem, a world-class technology-enabled environment that supports storytelling and unforgettable experiences across parks, resorts, media, and more.
Benefits and culture
- Bonus and/or long-term incentive units
- Medical benefits
- Financial benefits
Role at a glance
The role centers on engineering and operationalizing machine learning solutions that surface leading indicators of failure, automate remediation, and enhance observability across distributed systems. You will design end-to-end ML workflows, collaborate with cross-functional teams, and contribute to scalable software patterns used across the enterprise.
Responsibilities
- Collaborate with applications, infrastructure and operations teams to translate manual processes and business needs into ML-enabled solutions
- Architect and implement reusable ML frameworks, patterns, and services that plug into enterprise automation and observability platforms
- Design, train, and deploy models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and more to surface early failure indicators
- Develop near real-time inference pipelines that convert live telemetry into actionable insights from metrics, logs, traces, and events
- Create data abstractions and perform feature engineering on high-volume telemetry data
- Assess model performance using production signals and iteratively improve accuracy and reliability
- Build closed-loop, event-driven systems where model outputs trigger automated remediation actions
- Partner with infrastructure and SRE teams to embed ML insights into tools, workflows, and dashboards
- Analyze incidents and historical data to identify leading indicators and predictive signals
- Own the full machine learning lifecycle: experimentation, validation, deployment, monitoring, and retraining
- Decompose manual processes into reusable software modules leveraging ML models
- Develop emulation and simulation environments (digital twins) of infrastructure to test AI automation under realistic scenarios
- Create algorithms and frameworks to integrate ML/AI into the orchestration platform
- Ensure service reliability, performance and uptime through code-driven solutions
- Conduct root cause analysis, design fault-tolerant architectures, and enable self-healing automation
- Implement monitoring dashboards and KPIs to quantify automation and tooling performance
- Collaborate with network engineers, software developers, ML engineers, and operations teams across the enterprise
- Support the integration of commercial and open-source tools while keeping a vendor-agnostic approach
Requirements
- 7+ years of software engineering experience with a focus on automation, machine learning, and AI technologies
- Hands-on experience building production-grade ML models and inference pipelines; proficient with PyTorch, TensorFlow, Scikit-learn, and similar frameworks
- Design, train, and deploy ML models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and related tasks in distributed environments
- Hands-on experience building frontend, APIs and backend functionality; strong proficiency with Python, JavaScript, TypeScript, Go, or Rust
- Experience building emulation/simulation environments (digital twins) of infrastructure to test AI-driven automation
- Strong experience building and deploying event-driven or streaming ML models in production
- Solid foundation in statistics, data analysis, and applied ML techniques
- Experience working with large-scale, real-world datasets that are noisy, incomplete, and evolving
- Experience operationalizing models in distributed production environments
- Ability to translate ambiguous operational problems into solvable ML use cases
- Experience with modern cloud platforms, Kubernetes/Docker, identity/auth frameworks, and data/workflow orchestration
- Experience with AI/ML technologies and data engineering concepts; preferred: building AI agents
- Proven success designing enterprise-scale systems and reusable software frameworks
- Strong communication, collaboration and leadership skills
- Systems thinking and the ability to connect components into holistic solutions
- Able to shift quickly between hands-on work and high-level strategy
Technologies
PyTorch, TensorFlow, Scikit-learn, Python, JavaScript, TypeScript, Go, Rust, Kubernetes, Docker, AWS, Azure, GCP
Education
- Required: Bachelor's degree in Computer Science, Information Systems, Software, Electrical or Electronics Engineering, or a related field, or equivalent work experience
- Preferred: Master’s degree in Computer Science, Engineering, or a related discipline
Preferred qualifications
- Certifications such as Kubernetes (CKA/CKAD), AWS/Azure/GCP, CCNP/DevNet or NVIDIA AI engineer
- Experience building low-code/no-code automation platforms or reusable developer toolkits
- Contributions to open-source automation, ML, AI, observability, or DevOps communities
- Experience with unsupervised and semi-supervised learning for anomaly detection
- Expertise in complex event processing and event correlation
- Time-series forecasting for capacity, latency, and failure prediction
- Experience with feature stores, offline/online feature pipelines, and feature reuse
- Model monitoring for drift, bias, and performance degradation
- Experience with reinforcement learning or decision models for automated remediation and optimization
- Experience labeling, curating, and managing training data from production telemetry
- Mentoring engineers, knowledge sharing, and fostering a learning culture
- Curiosity and a continuous learning mindset in AI/ML, automation, and platform technologies
Team and department context
The Enterprise Technology team aims to deliver technology solutions that align with business strategies while enabling enterprise efficiency and cross-company collaborative innovation. The Machine Learning / Software Engineer works under the Director of Automation, Tooling, and Observability within Global Network Engineering & Operations, contributing to self-healing infrastructure management across production environments.