Principal Data Engineer – Safety Analytics (Global Medical Safety)
Job Description
A Principal Data Engineer leads safety analytics data engineering for Global Medical Safety, building scalable tools with AI, ML, and GenAI on Google Cloud Platform to support pharmacovigilance efforts.
Responsibilities
- Design and maintain production data pipelines and curated datasets that enable pharmacovigilance activities, including safety monitoring, analytics, and regulatory reporting.
- ]Ensure outputs are reproducible, explainable, and auditable to support safety decision making and inspection readiness.
- Enable AI, ML, and GenAI workflows for safety analytics, covering feature engineering and feature stores, embeddings and semantic retrieval, and Retrieval-Augmented Generation patterns.
- Own the end-to-end data lifecycle for safety analytics from source system intake through transformation, serving, and downstream consumption, ensuring continuity, traceability, and data integrity.
- Lead architectural decisions across ingestion, transformation, storage, and serving layers on GCP using tools like BigQuery, Dataform, and object storage.
- Design, implement, and automate scalable, reusable data pipelines and architectures to address evolving safety analytics needs.
- Establish and enforce data quality, validation, lineage, and observability standards for safety analytics datasets.
- Define and implement data governance practices, including data contracts, schema versioning, access control, stewardship, and lifecycle management.
- Ensure safety analytics data and systems meet Global Medical Safety requirements for reliability, auditability, and regulatory use.
- Apply GxP validation expertise to data pipelines, analytics services, and supporting infrastructure.
- Collaborate with quality and compliance teams to implement CSV/CSA-aligned controls, audit trails, documentation, and organizational change.
- Balance rapid delivery with the rigor required for regulated pharmacovigilance systems.
- Design and build APIs and microservices to operationalize safety analytics and ML capabilities, including feature serving, retrieval services, and analytics backends.
- Deploy and operate services on GCP with emphasis on security, scalability, and observability (Cloud Run, GKE).
- Enforce contract-first integration patterns between producing and consuming systems to ensure reliability and safe evolution.
- Provision and manage cloud infrastructure using Terraform (Infrastructure as Code) on GCP.
- Build and maintain CI/CD pipelines for data pipelines, analytics services, feature pipelines, and ML data assets (e.g., Jenkins).
- Continuously optimize performance and cost efficiency of data and analytics infrastructure while maintaining compliance and reliability.
- Serve as the technical authority and data engineering leader for Safety Analytics within Global Medical Safety.
- Review and influence designs across pipelines, services, feature stores, and AI/ML integrations to uphold a high technical standard; collaborate with safety scientists, epidemiologists, biostatisticians, analytics teams, IT, and platform partners.
- Communicate complex technical concepts and tradeoffs clearly to both technical and non-technical stakeholders.
- Mentor and upskill teams through guidance and knowledge sharing on modern data, cloud, and AI technologies.
Requirements
- Master’s degree in Computer Science, Engineering, or a related field, or equivalent experience, is required.
- Minimum of 5 years of data engineering experience.
Technologies
- Python
- SQL
- Google Cloud Platform (GCP)
- BigQuery
- Dataform
- Terraform
- Jenkins
- Cloud Run
- GKE
Compensation
- Base annual pay range: USD 102,000 – 177,100
Location and work arrangement
Location: Horsham, PA, onsite. Preferred location: Horsham, PA or Titusville, NJ. Remote work considered on a case by case basis.
Benefits
- Consolidated retirement plan (pension)
- Savings plan (401(k))
- Vacation: 120 hours per calendar year
- Sick time: 40 hours per calendar year; 48 hours (Colorado residents); 56 hours (Washington residents)
- Holiday pay, including floating holidays: 13 days per calendar year
- Work, personal and family time: up to 40 hours per calendar year
- Parental leave: 480 hours within one year of birth/adoption/foster care
- Bereavement leave: 240 hours for immediate family; 40 hours for extended family per year
- Caregiver leave: 80 hours in a 52-week rolling period
- Volunteer leave: 32 hours per calendar year
- Military spouse time-off: 80 hours per calendar year