Lead Scientific Data Engineer
Job Description
Join Lawrence Berkeley National Laboratory’s Joint Genome Institute in the San Francisco Bay Area with hybrid work options, a supportive culture, and a competitive compensation package. This role offers a salary range of USD 158,808 to 267,996 per year, strong health and retirement benefits, and a commitment to work-life balance and professional growth.
We are seeking a Lead Scientific Data Engineer to provide senior technical leadership for core genomic data systems, data management, job orchestration, and platform integration that drive AI enabled scientific discovery.
Responsibilities
- Provide senior technical leadership for JGI's core scientific data and compute platforms by crafting implementation roadmaps, data system architectures, and long term strategy.
- Design and implement production automated systems, APIs, and workflows that enable genomic data movement, metadata management, job orchestration, data access, and large scale scientific computing.
- Improve reliability, scalability, observability, interoperability, and maintainability of shared production data systems while supporting sustainable operations and delivery.
- Collaborate with product managers, scientists, and users to drive cross team alignment and integration decisions that address complex technical challenges and shared priorities.
Requirements
- Bachelor’s degree in Computer Science or a related field and a minimum of 12 years of professional experience with large scale scientific data and compute infrastructures, or an equivalent combination of education and experience.
- Proven experience leading the design, development, integration, and operation of production software and data systems that support metadata management, workflow orchestration, data lifecycle operations, and broad user data access.
- Advanced knowledge of data and software engineering fundamentals relevant to data intensive distributed systems, including system design, concurrency, performance, and testing.
- Wide experience with databases and data storage technologies, including relational databases, object storage, and systems managing structured, semi-structured, and large scale data.
- Experience with data engineering and event driven technologies such as Airflow or Kafka.
- Strong experience using AI coding agents like Claude Code, Codex, or Cursor, with the ability to review and validate generated software for production quality, security, and maintainability.
- Proficiency in Python and experience with one or more additional programming languages.
- Excellent communication skills, with the ability to present complex technical information to diverse audiences.
- Demonstrated ability to lead through influence in interdisciplinary environments, aligning users, stakeholders, and engineering teams around shared requirements and implementation plans.
Technologies
- Airflow
- Kafka
- Claude Code
- Codex
- Cursor
- Python
- WDL
- Nextflow
Benefits
- Exceptional health and retirement benefits, including pension or 401K style plans
- A belonging culture with strong investment in team wellbeing and growth
- Vacation and sick time plus a Winter Holiday Shutdown each year
- Parental bonding leave for both mothers and fathers
- Pet insurance