Key Responsibilities
Set up and maintain monitoring dashboards for ETL jobs using Datadog, including metrics, logs, and alerts.
Monitor daily ETL workflows and proactively detect and resolve data pipeline failures or performance issues.
Create Datadog Monitors for job status (success/failure), job duration, resource utilization, and error trends.
Work closely with Data Engineering teams to onboard new pipelines and ensure observability best practices.
Integrate Datadog with tools.
Conduct root cause analysis of ETL failures and performance bottlenecks.
Tune thresholds, baselines, and anomaly detection settings in Datadog to reduce false positives.
Document incident handling procedures and contribute to improving overall ETL monitoring maturity.
Participate in on call rotations or scheduled support windows to manage ETL health.
Required Skills & Qualifications
3+ years of experience in ETL/data pipeline monitoring, preferably in a cloud or hybrid environment.
Proficiency in using Datadog for metrics, logging, alerting, and dashboards.
Strong understanding of ETL concepts and tools (e.g., Airflow, Informatica, Talend, AWS Glue, or dbt).
Familiarity with SQL and querying large datasets.
Experience working with Python, Shell scripting, or Bash for automation and log parsing.
Understanding of cloud platforms (AWS/GCP/Azure) and services like S3, Redshift, BigQuery, etc.
Knowledge of CI/CD and DevOps principles related to data infrastructure monitoring.
Preferred Qualifications
Experience with distributed tracing and APM in Datadog.
Prior experience monitoring Spark, Kafka, or streaming pipelines.
Familiarity with ticketing tools (e.g., Jira, ServiceNow) and incident management workflows.
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.