The Data Scientist / Data Analytics Engineer role at Transflo is a remote-based position focused on designing, building, and operationalizing advanced analytics across transportation and logistics, delivering both predictive and point-in-time insights on AWS.

Responsibilities

Design, train, validate, and deploy predictive models spanning regression, classification, time-series forecasting, survival analysis, clustering, anomaly detection, and gradient-boosted or deep learning approaches as appropriate to the problem.
Lead model selection, hyperparameter tuning, cross-validation, and rigorous performance evaluation using business-aligned metrics such as precision/recall trade-offs, MAPE, RMSE, lift, and calibration.
Develop data products in transportation domains including operational metrics, fraud signals, pricing analytics, and industry trends.
Establish model monitoring, drift detection, retraining cadence, and explainability practices (SHAP, feature importance, partial dependence) to keep production models trustworthy.
Produce point-in-time analytics, KPI scorecards, and exception reporting to support daily decisions across dispatch, fleet, customer success, finance, and product teams.
Partner with business stakeholders to translate questions into well-scoped analyses and deliver clear, defensible insights with documented assumptions and data lineage.
Build and maintain reusable analytical datasets, semantic layers, and certified metrics to ensure a consistent source of truth.
Build and maintain data pipelines (batch and streaming) on AWS using Redshift, S3, Glue, Lambda, Step Functions, Kinesis, MSK, EMR, Athena, and SageMaker.
Implement medallion architecture to progressively refine raw operational data into analytics-ready and ML-ready datasets.
Apply STARR dimensional modeling to construct performant data models in Redshift and the warehouse layer.
Drive data selection, curation, profiling, and quality enforcement; define source-of-truth datasets, document lineage, and codify data contracts and validation tests.
Collaborate with data engineering and platform teams on CI/CD for data and ML assets, infrastructure as code, and cost-aware AWS design.
Turn customer-facing analytics ideas into shipped capabilities through partnerships with product management, design, and engineering.
Contribute to product discovery through customer interviews, opportunity sizing, prototyping, and rapid iteration of analytical concepts.
Own the analytical correctness of customer-facing metrics, models, and visualizations, including definitions, edge cases, and explanations for non-technical users.
Define and measure success metrics for analytics features and drive iterative improvements post-launch.
Translate complex analyses into clear narratives and visuals for technical and non-technical audiences, including executives and customers.
Partner cross-functionally with product, engineering, operations, and commercial teams to embed analytics into workflows and customer-facing products.
Mentor analysts and engineers on statistical rigor, modeling best practices, and modern data architecture.

Requirements

Bachelor's degree in Statistics, Mathematics, or Supply Chain Management; Computer Science is acceptable. Master’s degree preferred but not required.
Professional experience in transportation, trucking, freight, logistics, or related supply chain fields, with working knowledge of operational data (loads, stops, shipments, ELD/telematics, TMS, dispatch, billing, etc.).
Proven track record launching customer-facing analytics products from idea through production, including discovery, scoping, model and metric design, collaboration with product/engineering, and production support with real customers. An end-to-end example is expected.
Strong experience building end-to-end analytics models, including problem framing, data curation, feature engineering, model training and validation, and deployment.
Hands-on experience with AWS PaaS and analytics tooling, including Redshift and related services (S3, Glue, Lambda, Step Functions, Athena, Kinesis, EMR, SageMaker).
Proficiency in SQL (advanced window functions, performance tuning on Redshift or similar) and at least one analytics programming language such as Python, with libraries like pandas, scikit-learn, statsmodels, XGBoost/LightGBM, and PyTorch or TensorFlow as appropriate.
Experience designing and operating production data pipelines with attention to orchestration, idempotency, observability, and data quality.
Solid grounding in statistical methods including hypothesis testing, experimental design, regression, time-series analysis, and uncertainty quantification.

Technologies

AWS, Redshift, S3, Glue, Lambda, Step Functions, Kinesis, MSK, EMR, Athena, SageMaker
Python, pandas, scikit-learn, statsmodels, XGBoost, LightGBM, PyTorch, TensorFlow
Jupyter, SQL
BI / Visualization tools: QuickSight, Power BI, Looker
Orchestration / DevOps: Airflow, Git, CI/CD, Terraform, CloudFormation
Medallion architecture, STARR modeling

Preferred Qualifications

Master's degree in Statistics, Mathematics, Operations Research, Supply Chain, Computer Science, or a closely related quantitative field.
Experience implementing medallion architecture in a cloud data lakehouse or warehouse environment.
Experience designing STARR or star-schema dimensional models for analytics consumption.
Experience with streaming or event-driven data (Kinesis, Kafka/MSK) for near real-time analytics in transportation contexts.
Experience deploying and monitoring ML models in production using SageMaker, MLflow, or equivalent MLOps tooling.
Familiarity with BI visualization tools and semantic layer concepts.
Exposure to optimization or operations research techniques applied to transportation problems.
Experience with ELD/HOS data, telematics feeds, geospatial data, or TMS/dispatch data and transportation backoffice operations.

Core Competencies

Analytical rigor and the ability to defend methodology, assumptions, and uncertainty.
Business pragmatism and the capacity to ship value quickly with practical models.
Product mindset for customer-facing analytics and willingness to iterate with product and engineering partners.
Engineering discipline, reproducibility, and data lineage awareness.
Stakeholder partnership and clear communication of trade-offs.
Curiosity and ownership in identifying data quality issues and driving root-cause resolution.

Representative Tech Environment

Cloud and Data Platform: AWS stack including Redshift, S3, Glue, Lambda, Step Functions, Athena, Kinesis, EMR, SageMaker
Modeling and Analysis: Python with pandas, scikit-learn, statsmodels, XGBoost/LightGBM, PyTorch/TensorFlow; SQL; Jupyter
Data Architecture: Medallion approach; STARR models; data contracts and lineage tooling
Orchestration and DevOps: Airflow, Step Functions, Git, CI/CD, Terraform or CloudFormation
Visualization: QuickSight, Power BI, Looker

Data Scientist / Data Analytics Engineer

Job Description

Responsibilities

Requirements

Technologies

Preferred Qualifications

Core Competencies

Representative Tech Environment

Similar Jobs

Data Engineer

Senior Data Analytics Engineer

Data Engineer, OIS/CXI Analytics

Senior Data Analytics Engineer

Data Engineer

Senior Data Engineer (AWS, Databricks, Python)

Get Job Alerts