Hiring for Big Data (PySpark) Engineer - Chennai, Bangalore
Job Overview: We are looking for a Big Data (PySpark) Engineer to join our dynamic data engineering team. The ideal candidate will have hands-on experience with big data frameworks, particularly Apache Spark (with a focus on PySpark), and be capable of building, optimizing, and managing large-scale data processing systems. As a Big Data Engineer, you will work closely with Data Scientists, Analysts, and Architects to design and implement efficient data pipelines and ensure that our data infrastructure is scalable and robust.
Responsibilities:
Data Pipeline Development:
Design, develop, and maintain scalable and efficient data pipelines using PySpark and related big data technologies.
Data Processing & Transformation:
Write complex data transformations using PySpark to process large datasets and extract meaningful insights.
Optimization & Performance Tuning:
Optimize Spark jobs for performance, manage memory and data partitioning, and troubleshoot performance bottlenecks.
ETL Workflows:
Build and manage ETL (Extract, Transform, Load) workflows that process data from various sources (e.g., databases, APIs, file systems).
Data Integration:
Work with different data sources like HDFS, AWS S3, Google Cloud Storage, and relational or NoSQL databases.
Data Quality & Monitoring:
Implement data validation checks and monitor data pipeline health, logging, and error handling.
Collaboration:
Collaborate with data scientists, analysts, and business teams to ensure data infrastructure meets the company's needs.
Cloud Computing:
Work with cloud platforms such as AWS, Azure, or GCP to deploy and manage big data workflows.
Documentation & Best Practices:
Maintain clear documentation for code, processes, and system designs. Promote best practices in coding, version control, and testing.
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.