Senior PySpark Data Engineer
Job Description
The Senior PySpark Data Engineer will design, develop, and maintain data solutions within a Big Data environment, leveraging PySpark and Python to build scalable pipelines, ensure data quality, and migrate data across systems from on-premises to cloud deployments. This onsite role is based in Irving, Texas, with Tata Consultancy Services.
Responsibilities
- Architect and maintain robust, scalable data pipelines using PySpark to support high performance data processing.
- Develop data pipelines, ensure data quality, and implement ETL processes to migrate and deploy data across systems.
- Translate Ab Initio ETL applications into PySpark based data pipelines.
- Migrate on premises workloads to cloud environments (AWS, Databricks, Snowflake) according to use case requirements.
- Collaborate with cross-functional teams to identify and resolve data related issues.
- Stay informed of the latest advancements in data engineering and integrate innovative approaches to maintain a competitive edge.
Requirements
- 8+ years of professional experience in Hadoop and PySpark/Python development.
- Proven expertise in PySpark with experience handling large volumes of data.
- 3+ years of hands-on experience with AWS, Databricks/Snowflake, and Airflow.
- Familiarity with CI/CD pipelines and version control systems such as Git.
- Strong debugging and problem-solving skills.
- Excellent communication and collaboration abilities.
Technologies
- PySpark
- Python
- Hadoop
- Ab Initio
- AWS
- Databricks
- Snowflake
- Airflow
- Git
- Docker
- AWS EKS
Location
Irving, TX (onsite)
Job Function
Technology
Role
Engineer
Job ID
411835
Salary
USD 120,000 - 140,000 per year
Desired Skills
- Hadoop
Desired Candidate Profile
- Bachelor of Computer Science