DataJobs.io
← Back to all jobs

Job Description

This onsite Senior Data Engineer role in Austin, TX focuses on building analytics-ready data platforms to power observability, reliability analysis, and capacity forecasting for NVIDIA's EDA datacenters. The position transforms large-scale telemetry and observability data into trusted datasets that enable data scientists, analysts, and engineers to derive insights across global CPU and GPU compute clusters.

Responsibilities

  • Architect, implement, and sustain analytics-focused data pipelines that ingest, transform, and curate observability data from EDA datacenters.
  • Develop reliable ingestion pipelines for metrics, logs, traces, and hardware health telemetry produced by large-scale CPU and GPU clusters.
  • Collaborate with observability engineers to merge data from Prometheus, Grafana, Elastic/OpenSearch, and Spark-based platforms into unified analytical datasets.
  • Model and organize data to support exploratory analysis, reliability modeling, forecasting, and long-term trend analysis.
  • Build and optimize batch and streaming workflows enabling near real-time analytics and historical analysis.
  • Implement data quality checks, validation frameworks, and monitoring to ensure analytical accuracy and consistency.
  • Define data retention, aggregation, and enrichment strategies balancing analytical needs, system performance, and storage costs.
  • Enable self-service analytics by improving data discoverability, documentation, and usability.
  • Collaborate with data scientists and analysts to understand analytical requirements and evolve datasets to support new models and insights.
  • Continuously improve pipeline scalability, reliability, and performance as datacenter footprint and workload complexity grow.

Requirements

  • MS preferred or BS in Computer Science or related field, or equivalent experience, with at least 5+ years designing, building, and operating large-scale data pipelines and data platforms for distributed systems or infrastructure data.
  • Proficiency in Python and SQL, with a track record supporting analytical and exploratory workloads.
  • Hands-on experience with distributed data processing frameworks such as Spark or similar technologies.
  • Familiarity working with observability and telemetry data, including metrics, logs, traces, and time-series data.
  • Experience designing data models and schemas that support flexible analysis and forecasting.
  • Ability to take ownership of data engineering initiatives and drive them end-to-end in collaboration with multi-functional partners.
  • Experience implementing data quality, validation, and monitoring for analytics pipelines.
  • Strong communication and collaboration skills, particularly when working with engineering and infrastructure teams.
  • Adaptability in fast paced environments with evolving analytical and operational needs.

Technologies

  • Python
  • SQL
  • Spark
  • Prometheus
  • Grafana
  • Elastic/OpenSearch
  • Kafka

Benefits

  • Equity and benefits

Ways to Stand Out

  • Experience supporting datacenter infrastructure analytics, hardware reliability programs, or workload performance analysis.
  • Familiarity with EDA workflows, HPC environments, or GPU-accelerated compute platforms.
  • Experience integrating or operating observability stacks such as Prometheus, Grafana, Elastic/OpenSearch, Kafka, Spark, or similar tools.

Similar Jobs

Get Job Alerts

New jobs delivered to your inbox.