Expertise in workflow orchestration using Apache Airflow.
Experience with cloud SQL or other relational databases (PostgreSQL preferred).
Solid understanding of data modeling ETL processes and streaming/batch data pipelines.
Familiarity with DevOps practices CI/CD and containerization (Docker Kubernetes).
Responsibilities
Key Responsibilities and Skills
Build Data Pipelines on GCP: They need hands-on experience with core GCP services used for data processing.
Dataflow: For building both batch and streaming data pipelines.
Pub/Sub: For real-time messaging and event ingestion.
Cloud Storage: For storing raw and processed data.
Cloud Composer: A managed service for Apache Airflow used for orchestrating complex data workflows.
Orchestrate Workflows: A key requirement is expertise in Apache Airflow. This tool is essential for scheduling monitoring and managing data workflows often represented as Directed Acyclic Graphs (DAGs).
Manage Databases: The role requires experience with Cloud SQL or other relational databases with a preference for PostgreSQL. This suggests they Will be working with structured data storage.
Understand Data Concepts: A strong foundation in the theoretical side of data engineering is crucial.
Data Modeling: Designing the structure of databases to meet business needs.
ETL Processes: The classic Extract Transform Load pipeline for moving and processing data.
Streaming/Batch Data Pipelines: Understanding the differences and how to build both types of pipelines.
Apply DevOps Practices: Familiarity with modern software development practices is a plus.
CI/CD: Continuous Integration and Continuous Delivery for automating code deployment.
* Containerization: Using technologies like Docker and Kubernetes to package and deploy applications consistently across different environments.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.