Data Scientist / Data Analytics Engineer
Analytics
Automation
AWS
Business Analytics
Business Intelligence
Cloud
Data
Data Analysis
Data Analytics
Data Architecture
Data Engineer
Data Integration
Data Management
Data Modeling
Data Pipeline
Data Platform
Data Processing
Data Science
Data Warehouse
Database
Dimensional Modeling
ETL
Integration
SQL
Job Description
The Data Scientist / Data Analytics Engineer role at Transflo is a remote-based position focused on designing, building, and operationalizing advanced analytics across transportation and logistics, delivering both predictive and point-in-time insights on AWS.
Responsibilities
- Design, train, validate, and deploy predictive models spanning regression, classification, time-series forecasting, survival analysis, clustering, anomaly detection, and gradient-boosted or deep learning approaches as appropriate to the problem.
- Lead model selection, hyperparameter tuning, cross-validation, and rigorous performance evaluation using business-aligned metrics such as precision/recall trade-offs, MAPE, RMSE, lift, and calibration.
- Develop data products in transportation domains including operational metrics, fraud signals, pricing analytics, and industry trends.
- Establish model monitoring, drift detection, retraining cadence, and explainability practices (SHAP, feature importance, partial dependence) to keep production models trustworthy.
- Produce point-in-time analytics, KPI scorecards, and exception reporting to support daily decisions across dispatch, fleet, customer success, finance, and product teams.
- Partner with business stakeholders to translate questions into well-scoped analyses and deliver clear, defensible insights with documented assumptions and data lineage.
- Build and maintain reusable analytical datasets, semantic layers, and certified metrics to ensure a consistent source of truth.
- Build and maintain data pipelines (batch and streaming) on AWS using Redshift, S3, Glue, Lambda, Step Functions, Kinesis, MSK, EMR, Athena, and SageMaker.
- Implement medallion architecture to progressively refine raw operational data into analytics-ready and ML-ready datasets.
- Apply STARR dimensional modeling to construct performant data models in Redshift and the warehouse layer.
- Drive data selection, curation, profiling, and quality enforcement; define source-of-truth datasets, document lineage, and codify data contracts and validation tests.
- Collaborate with data engineering and platform teams on CI/CD for data and ML assets, infrastructure as code, and cost-aware AWS design.
- Turn customer-facing analytics ideas into shipped capabilities through partnerships with product management, design, and engineering.
- Contribute to product discovery through customer interviews, opportunity sizing, prototyping, and rapid iteration of analytical concepts.
- Own the analytical correctness of customer-facing metrics, models, and visualizations, including definitions, edge cases, and explanations for non-technical users.
- Define and measure success metrics for analytics features and drive iterative improvements post-launch.
- Translate complex analyses into clear narratives and visuals for technical and non-technical audiences, including executives and customers.
- Partner cross-functionally with product, engineering, operations, and commercial teams to embed analytics into workflows and customer-facing products.
- Mentor analysts and engineers on statistical rigor, modeling best practices, and modern data architecture.
Requirements
- Bachelor's degree in Statistics, Mathematics, or Supply Chain Management; Computer Science is acceptable. Master’s degree preferred but not required.
- Professional experience in transportation, trucking, freight, logistics, or related supply chain fields, with working knowledge of operational data (loads, stops, shipments, ELD/telematics, TMS, dispatch, billing, etc.).
- Proven track record launching customer-facing analytics products from idea through production, including discovery, scoping, model and metric design, collaboration with product/engineering, and production support with real customers. An end-to-end example is expected.
- Strong experience building end-to-end analytics models, including problem framing, data curation, feature engineering, model training and validation, and deployment.
- Hands-on experience with AWS PaaS and analytics tooling, including Redshift and related services (S3, Glue, Lambda, Step Functions, Athena, Kinesis, EMR, SageMaker).
- Proficiency in SQL (advanced window functions, performance tuning on Redshift or similar) and at least one analytics programming language such as Python, with libraries like pandas, scikit-learn, statsmodels, XGBoost/LightGBM, and PyTorch or TensorFlow as appropriate.
- Experience designing and operating production data pipelines with attention to orchestration, idempotency, observability, and data quality.
- Solid grounding in statistical methods including hypothesis testing, experimental design, regression, time-series analysis, and uncertainty quantification.
Technologies
- AWS, Redshift, S3, Glue, Lambda, Step Functions, Kinesis, MSK, EMR, Athena, SageMaker
- Python, pandas, scikit-learn, statsmodels, XGBoost, LightGBM, PyTorch, TensorFlow
- Jupyter, SQL
- BI / Visualization tools: QuickSight, Power BI, Looker
- Orchestration / DevOps: Airflow, Git, CI/CD, Terraform, CloudFormation
- Medallion architecture, STARR modeling
Preferred Qualifications
- Master's degree in Statistics, Mathematics, Operations Research, Supply Chain, Computer Science, or a closely related quantitative field.
- Experience implementing medallion architecture in a cloud data lakehouse or warehouse environment.
- Experience designing STARR or star-schema dimensional models for analytics consumption.
- Experience with streaming or event-driven data (Kinesis, Kafka/MSK) for near real-time analytics in transportation contexts.
- Experience deploying and monitoring ML models in production using SageMaker, MLflow, or equivalent MLOps tooling.
- Familiarity with BI visualization tools and semantic layer concepts.
- Exposure to optimization or operations research techniques applied to transportation problems.
- Experience with ELD/HOS data, telematics feeds, geospatial data, or TMS/dispatch data and transportation backoffice operations.
Core Competencies
- Analytical rigor and the ability to defend methodology, assumptions, and uncertainty.
- Business pragmatism and the capacity to ship value quickly with practical models.
- Product mindset for customer-facing analytics and willingness to iterate with product and engineering partners.
- Engineering discipline, reproducibility, and data lineage awareness.
- Stakeholder partnership and clear communication of trade-offs.
- Curiosity and ownership in identifying data quality issues and driving root-cause resolution.
Representative Tech Environment
- Cloud and Data Platform: AWS stack including Redshift, S3, Glue, Lambda, Step Functions, Athena, Kinesis, EMR, SageMaker
- Modeling and Analysis: Python with pandas, scikit-learn, statsmodels, XGBoost/LightGBM, PyTorch/TensorFlow; SQL; Jupyter
- Data Architecture: Medallion approach; STARR models; data contracts and lineage tooling
- Orchestration and DevOps: Airflow, Step Functions, Git, CI/CD, Terraform or CloudFormation
- Visualization: QuickSight, Power BI, Looker