Pipeline Development - Design, build, and deploy robust ETL/ELT pipelines in Databricks (PySpark, SQL, Delta Lake) to ingest, transform, and curate governance and operational metadata from multiple sources landed in Databricks.
Granular Data Quality Capture - Implement profiling logic to capture issue-level metadata (source table, column, timestamp, severity, rule type) to support drill-down from dashboards into specific records and enable targeted remediation.
Governance Metrics Automation - Develop data pipelines to generate metrics for dashboards covering data quality, lineage, job monitoring, access & permissions, query cost, usage & consumption, retention & lifecycle, policy enforcement, sensitive data mapping, and governance KPIs.
Microsoft Purview Integration - Automate asset onboarding, metadata enrichment, classification tagging, and lineage extraction for integration into governance reporting.
Data Retention & Policy Enforcement - Implement logic for retention tracking and policy compliance monitoring (masking, RLS, exceptions).
Job & Query Monitoring - Build pipelines to track job performance, SLA adherence, and query costs for cost and performance optimization.
Metadata Storage & Optimization - Maintain curated Delta tables for governance metrics, structured for efficient dashboard consumption.