DataJobs.io
← Back to all jobs

Job Description

Location: Lake Buena Vista, FL (onsite). Salary: USD 155,700 - 208,700 per year.

Join The Walt Disney Company as a Senior Machine Learning Engineer in a role focused on designing, deploying, and operating ML driven self-healing automation across enterprise systems. You will build reusable ML frameworks and deliver AI powered insights to improve reliability and observability. This position sits within Disney’s Enterprise Technology ecosystem, a world-class technology-enabled environment that supports storytelling and unforgettable experiences across parks, resorts, media, and more.

Benefits and culture

  • Bonus and/or long-term incentive units
  • Medical benefits
  • Financial benefits

Role at a glance

The role centers on engineering and operationalizing machine learning solutions that surface leading indicators of failure, automate remediation, and enhance observability across distributed systems. You will design end-to-end ML workflows, collaborate with cross-functional teams, and contribute to scalable software patterns used across the enterprise.

Responsibilities

  • Collaborate with applications, infrastructure and operations teams to translate manual processes and business needs into ML-enabled solutions
  • Architect and implement reusable ML frameworks, patterns, and services that plug into enterprise automation and observability platforms
  • Design, train, and deploy models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and more to surface early failure indicators
  • Develop near real-time inference pipelines that convert live telemetry into actionable insights from metrics, logs, traces, and events
  • Create data abstractions and perform feature engineering on high-volume telemetry data
  • Assess model performance using production signals and iteratively improve accuracy and reliability
  • Build closed-loop, event-driven systems where model outputs trigger automated remediation actions
  • Partner with infrastructure and SRE teams to embed ML insights into tools, workflows, and dashboards
  • Analyze incidents and historical data to identify leading indicators and predictive signals
  • Own the full machine learning lifecycle: experimentation, validation, deployment, monitoring, and retraining
  • Decompose manual processes into reusable software modules leveraging ML models
  • Develop emulation and simulation environments (digital twins) of infrastructure to test AI automation under realistic scenarios
  • Create algorithms and frameworks to integrate ML/AI into the orchestration platform
  • Ensure service reliability, performance and uptime through code-driven solutions
  • Conduct root cause analysis, design fault-tolerant architectures, and enable self-healing automation
  • Implement monitoring dashboards and KPIs to quantify automation and tooling performance
  • Collaborate with network engineers, software developers, ML engineers, and operations teams across the enterprise
  • Support the integration of commercial and open-source tools while keeping a vendor-agnostic approach

Requirements

  • 7+ years of software engineering experience with a focus on automation, machine learning, and AI technologies
  • Hands-on experience building production-grade ML models and inference pipelines; proficient with PyTorch, TensorFlow, Scikit-learn, and similar frameworks
  • Design, train, and deploy ML models for anomaly detection, forecasting, predictive analytics, event correlation, pattern recognition, classification, causal analysis, and related tasks in distributed environments
  • Hands-on experience building frontend, APIs and backend functionality; strong proficiency with Python, JavaScript, TypeScript, Go, or Rust
  • Experience building emulation/simulation environments (digital twins) of infrastructure to test AI-driven automation
  • Strong experience building and deploying event-driven or streaming ML models in production
  • Solid foundation in statistics, data analysis, and applied ML techniques
  • Experience working with large-scale, real-world datasets that are noisy, incomplete, and evolving
  • Experience operationalizing models in distributed production environments
  • Ability to translate ambiguous operational problems into solvable ML use cases
  • Experience with modern cloud platforms, Kubernetes/Docker, identity/auth frameworks, and data/workflow orchestration
  • Experience with AI/ML technologies and data engineering concepts; preferred: building AI agents
  • Proven success designing enterprise-scale systems and reusable software frameworks
  • Strong communication, collaboration and leadership skills
  • Systems thinking and the ability to connect components into holistic solutions
  • Able to shift quickly between hands-on work and high-level strategy

Technologies

PyTorch, TensorFlow, Scikit-learn, Python, JavaScript, TypeScript, Go, Rust, Kubernetes, Docker, AWS, Azure, GCP

Education

  • Required: Bachelor's degree in Computer Science, Information Systems, Software, Electrical or Electronics Engineering, or a related field, or equivalent work experience
  • Preferred: Master’s degree in Computer Science, Engineering, or a related discipline

Preferred qualifications

  • Certifications such as Kubernetes (CKA/CKAD), AWS/Azure/GCP, CCNP/DevNet or NVIDIA AI engineer
  • Experience building low-code/no-code automation platforms or reusable developer toolkits
  • Contributions to open-source automation, ML, AI, observability, or DevOps communities
  • Experience with unsupervised and semi-supervised learning for anomaly detection
  • Expertise in complex event processing and event correlation
  • Time-series forecasting for capacity, latency, and failure prediction
  • Experience with feature stores, offline/online feature pipelines, and feature reuse
  • Model monitoring for drift, bias, and performance degradation
  • Experience with reinforcement learning or decision models for automated remediation and optimization
  • Experience labeling, curating, and managing training data from production telemetry
  • Mentoring engineers, knowledge sharing, and fostering a learning culture
  • Curiosity and a continuous learning mindset in AI/ML, automation, and platform technologies

Team and department context

The Enterprise Technology team aims to deliver technology solutions that align with business strategies while enabling enterprise efficiency and cross-company collaborative innovation. The Machine Learning / Software Engineer works under the Director of Automation, Tooling, and Observability within Global Network Engineering & Operations, contributing to self-healing infrastructure management across production environments.

Similar Jobs

Get Job Alerts

New jobs delivered to your inbox.