ETL Developer will be responsible for designing, implementing, and optimizing distributed data processing jobs to handle large-scale data in Hadoop Distributed File System(HDFS) using Apache Spark and Python. This role required deep understanding of data engineering principles, proficiency in Python and hands-on experience with Spark and Hadoop ecosystems. Developer will collaborate with data engineers, analysts, and business stakeholders to process, transform and drive insights and data driven decisions.
Responsibilities:
Data Processing and Transformation:
Design and Implement of Spark applications to process and transform large datasets in HDFS.
Develop ETL Pipelines in Spark using Python for data Ingestion, cleaning, aggregation, and transformations.
Performance Optimization:
Optimize Spark jobs for efficiency, reducing run time and resource usage.
Finetune memory management, caching, and partitioning strategies for Optimal performance
Data Engineering with Hadoop and Spark:
Load data from different sources into HDFS, ensuring data accuracy and integrity.
Integrate Spark Applications with Hadoop frameworks like Hive, Sqoop etc.
Testing and debugging:
Troubleshoot and debug Spark Job failures, monitor job logs, and Spark UI to Identify Issues.
Qualifications:
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.