with strong data engineering expertise to design, develop, and optimize scalable data pipelines for large-scale data processing. The role involves working across distributed systems, ETL/ELT frameworks, cloud data platforms, and analytics-driven architecture. You will collaborate closely with cross-functional teams to ensure efficient ingestion, transformation, and delivery of high-quality data.
Key Responsibilities
Design and develop robust, scalable ETL/ELT pipelines using PySpark to process data from databases, APIs, logs, and file-based sources.
Convert raw data into analysis-ready datasets for data hubs and analytical data marts.
Build reusable, parameterized Spark jobs for batch and micro-batch processing.
Optimize PySpark performance to handle large and complex datasets.
Ensure data quality, consistency, lineage, and maintain detailed documentation for all ingestion workflows.
Collaborate with Data Architects, Data Modelers, and Data Scientists to implement data ingestion logic aligned with business requirements.
Work with AWS services (S3, Glue, EMR, Redshift) for data ingestion, storage, and processing.
Support version control, CI/CD practices, and infrastructure-as-code workflows as needed.
Must-Have Skills
Minimum
5+ years
of data engineering experience, with a strong focus on
PySpark/Spark
.
Proven experience building ingestion frameworks for relational, semi-structured (JSON, XML), and unstructured data (logs, PDFs).
Strong Python knowledge along with key data processing libraries.
Advanced SQL proficiency (Redshift, PostgreSQL, or similar).
Hands-on experience with distributed computing platforms (Spark on EMR, Databricks, etc.).
Familiarity with workflow orchestration tools (AWS Step Functions or similar).
Strong understanding of data lake and data warehouse architectures, including core data modeling concepts.
Good-to-Have Skills
Experience with AWS services: Glue, S3, Redshift, Lambda, CloudWatch, etc.
Exposure to Delta Lake or similar large-scale storage frameworks.
Experience with real-time streaming tools: Spark Structured Streaming, Kafka.
Understanding of data governance, lineage, and cataloging tools (Glue Catalog, Apache Atlas).
Knowledge of DevOps and CI/CD pipelines (Git, Jenkins, etc.).
Job Type: Full-time
Pay: ₹1,400,000.00 - ₹1,800,000.00 per year
Application Question(s):
How many years of experience you have as Pyspark Developer?
Have you worked with Python, Amazon Redshift, PostgreSQL?
Mention your Current Location?
Mention your NP, Current CTC and ECTC.
Work Location: In person
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.