The Mid-Level Data Engineer/DataOps Specialist is responsible for end-to-end design and implementation of our data architecture, spanning both data engineering and modelling lifecycles. You will develop and maintain AWS-native and Hadoop/Spark pipelines, apply relational and dimensional modelling best practices, and collaborate with analysts, data scientists, and actuaries to deliver high-quality data products. This role demands a pragmatic "data-as-a-product" mindset, strong Python and SQL skills, and the ability to optimize cloud infrastructure for performance, scale, and governance.
Key Responsibilities
Data Architecture & Modelling: Define logical and dimensional schemas; ensure normalization, relational integrity, and optimized designs for analytics and reporting.
AWS Pipeline Development: Build and operate ETL/ELT workflows with AWS Glue, Amazon Managed Airflow, and AWS Data Pipeline.
Spark & Hadoop Ecosystem: Develop and tune Spark applications (PySpark/Scala) on EMR/Databricks; manage Hadoop clusters (HDFS, YARN, Hive).
Data Lake & Warehousing: Design S3-based data lakes (Lake Formation) and Redshift warehouses, optimizing distribution/sort keys and partitioning.
Infrastructure as Code: Provision and maintain AWS resources (VPCs, EMR/Spark clusters, Glue jobs) using Terraform or CloudFormation.
Streaming & Messaging: Implement real-time pipelines with Spark Structured Streaming, Amazon Kinesis, or Apache Kafka (MSK).
Data Quality & Governance: Embed tests and documentation in dbt workflows; enforce data quality via AWS Glue Data Quality or Deequ; maintain data lineage in Glue Catalog.
Performance & Monitoring: Profile and optimize pipelines, SQL queries, and Spark jobs; configure CloudWatch and Spark UI dashboards with alerts for anomalies.
Collaboration & Mentorship: Partner with cross-functional teams (BI, analytics, actuarial) to translate requirements; mentor junior engineers on best practices.
Continuous Improvement: Research and pilot new technologies (EMR Studio, Glue Studio, Amazon Athena, Databricks Delta Lake) to enhance our data platform.
Proficient in Python and SQL (Redshift, Athena, Hive, Spark SQL).
Spark expertise (PySpark or Scala) and Hadoop cluster management.
Deep understanding of relational and dimensional modelling.
Infrastructure as Code with Terraform or CloudFormation.
Experience with dbt for transformation workflows and automated testing.
Excellent communication and stakeholder management skills.
Job Type: Permanent
Pay: ₹1,500,000.00 - ?3,000,000.00 per year
Benefits:
Work from home
Schedule:
Day shift
Experience:
Data Engineering: 4 years (Required)
DataOps: 3 years (Required)
AWS experience (S3, Glue, Redshift, EMR, Lambda): 3 years (Required)
Python and SQL: 3 years (Required)
Spark expertise (PySpark or Scala) and Hadoop: 3 years (Required)
Infrastructure as Code: 3 years (Required)
dbt (data build tool): 3 years (Required)
Data Architecture & Modelling: 3 years (Required)
Insurance domain (claims, or catastrophe modelling): 2 years (Required)
Work Location: Remote
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.
Job Detail
Job Id
JD3795839
Industry
Not mentioned
Total Positions
1
Job Type:
Full Time
Salary:
Not mentioned
Employment Status
Permanent
Job Location
Remote, IN, India
Education
Not mentioned
Experience
Year
Apply For This Job
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.