Location: Lake Buena Vista, FL (onsite). Salary: USD 155,700 - 208,700 per year.

Join The Walt Disney Company as a Senior Machine Learning Engineer in a role focused on designing, deploying, and operating ML driven self-healing automation across enterprise systems. You will build reusable ML frameworks and deliver AI powered insights to improve reliability and observability. This position sits within Disney’s Enterprise Technology ecosystem, a world-class technology-enabled environment that supports storytelling and unforgettable experiences across parks, resorts, media, and more.

Benefits and culture

Bonus and/or long-term incentive units
Medical benefits
Financial benefits

Role at a glance

The role centers on engineering and operationalizing machine learning solutions that surface leading indicators of failure, automate remediation, and enhance observability across distributed systems. You will design end-to-end ML workflows, collaborate with cross-functional teams, and contribute to scalable software patterns used across the enterprise.

Responsibilities

Collaborate with applications, infrastructure and operations teams to translate manual processes and business needs into ML-enabled solutions
Architect and implement reusable ML frameworks, patterns, and services that plug into enterprise automation and observability platforms
Design, train, and deploy models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and more to surface early failure indicators
Develop near real-time inference pipelines that convert live telemetry into actionable insights from metrics, logs, traces, and events
Create data abstractions and perform feature engineering on high-volume telemetry data
Assess model performance using production signals and iteratively improve accuracy and reliability
Build closed-loop, event-driven systems where model outputs trigger automated remediation actions
Partner with infrastructure and SRE teams to embed ML insights into tools, workflows, and dashboards
Analyze incidents and historical data to identify leading indicators and predictive signals
Own the full machine learning lifecycle: experimentation, validation, deployment, monitoring, and retraining
Decompose manual processes into reusable software modules leveraging ML models
Develop emulation and simulation environments (digital twins) of infrastructure to test AI automation under realistic scenarios
Create algorithms and frameworks to integrate ML/AI into the orchestration platform
Ensure service reliability, performance and uptime through code-driven solutions
Conduct root cause analysis, design fault-tolerant architectures, and enable self-healing automation
Implement monitoring dashboards and KPIs to quantify automation and tooling performance
Collaborate with network engineers, software developers, ML engineers, and operations teams across the enterprise
Support the integration of commercial and open-source tools while keeping a vendor-agnostic approach

Requirements

7+ years of software engineering experience with a focus on automation, machine learning, and AI technologies
Hands-on experience building production-grade ML models and inference pipelines; proficient with PyTorch, TensorFlow, Scikit-learn, and similar frameworks
Design, train, and deploy ML models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and related tasks in distributed environments
Hands-on experience building frontend, APIs and backend functionality; strong proficiency with Python, JavaScript, TypeScript, Go, or Rust
Experience building emulation/simulation environments (digital twins) of infrastructure to test AI-driven automation
Strong experience building and deploying event-driven or streaming ML models in production
Solid foundation in statistics, data analysis, and applied ML techniques
Experience working with large-scale, real-world datasets that are noisy, incomplete, and evolving
Experience operationalizing models in distributed production environments
Ability to translate ambiguous operational problems into solvable ML use cases
Experience with modern cloud platforms, Kubernetes/Docker, identity/auth frameworks, and data/workflow orchestration
Experience with AI/ML technologies and data engineering concepts; preferred: building AI agents
Proven success designing enterprise-scale systems and reusable software frameworks
Strong communication, collaboration and leadership skills
Systems thinking and the ability to connect components into holistic solutions
Able to shift quickly between hands-on work and high-level strategy

Technologies

PyTorch, TensorFlow, Scikit-learn, Python, JavaScript, TypeScript, Go, Rust, Kubernetes, Docker, AWS, Azure, GCP

Education

Required: Bachelor's degree in Computer Science, Information Systems, Software, Electrical or Electronics Engineering, or a related field, or equivalent work experience
Preferred: Master’s degree in Computer Science, Engineering, or a related discipline

Preferred qualifications

Certifications such as Kubernetes (CKA/CKAD), AWS/Azure/GCP, CCNP/DevNet or NVIDIA AI engineer
Experience building low-code/no-code automation platforms or reusable developer toolkits
Contributions to open-source automation, ML, AI, observability, or DevOps communities
Experience with unsupervised and semi-supervised learning for anomaly detection
Expertise in complex event processing and event correlation
Time-series forecasting for capacity, latency, and failure prediction
Experience with feature stores, offline/online feature pipelines, and feature reuse
Model monitoring for drift, bias, and performance degradation
Experience with reinforcement learning or decision models for automated remediation and optimization
Experience labeling, curating, and managing training data from production telemetry
Mentoring engineers, knowledge sharing, and fostering a learning culture
Curiosity and a continuous learning mindset in AI/ML, automation, and platform technologies

Team and department context

The Enterprise Technology team aims to deliver technology solutions that align with business strategies while enabling enterprise efficiency and cross-company collaborative innovation. The Machine Learning / Software Engineer works under the Director of Automation, Tooling, and Observability within Global Network Engineering & Operations, contributing to self-healing infrastructure management across production environments.

Sr Machine Learning Engineer

Job Description

Benefits and culture

Role at a glance

Responsibilities

Requirements

Technologies

Education

Preferred qualifications

Team and department context

Similar Jobs

Senior Machine Learning Engineer

Machine Learning Engineer

Senior Machine Learning Engineer

Senior Machine Learning Engineer

Sr. Lead Machine Learning Engineer

Senior Machine Learning Engineer