with over 4 years of professional experience in building, automating, and optimizing data pipelines and cloud-based architectures. The ideal candidate will have hands-on experience with
cloud data services (AWS, Azure, or GCP)
and
CI/CD pipelines
for deploying scalable, reliable, and secure data solutions.
The candidate will collaborate with cross-functional teams including data analysts, data scientists, and software engineers to design and maintain robust data infrastructure that supports analytics, AI/ML workflows, and enterprise reporting systems.
Key Responsibilities:
Design, build, and maintain
end-to-end ETL/ELT pipelines
using both on-premise and cloud-based technologies.
Architect and operate
data storage and streaming solutions
leveraging cloud-based services on
AWS
,
Azure
, or
GCP
.
Design and implement
data ingestion and transformation workflows
using
Airflow
,
AWS Glue
, or
Azure Data Factory
.
Develop and optimize
data pipelines
using
Python
and
PySpark
for large-scale distributed data processing.
Build
data models
-- normalized, denormalized, and dimensional (
Star/Snowflake
) -- for analytics and warehousing solutions.
Implement
data quality
,
lineage
, and
governance
using metadata management and monitoring tools.
Collaborate with
cross-functional teams
to deliver clean, reliable, and timely data for analytics and machine learning use cases.
Integrate
CI/CD pipelines
for data infrastructure deployment using
GitHub Actions
,
Jenkins
, or
Azure DevOps
.
Automate infrastructure provisioning using
Infrastructure as Code (IaC)
tools such as
AWS CloudFormation
or
Terraform
.
Monitor and optimize
data processing performance
for scalability, reliability, and cost-efficiency.
Enforce
data security policies
and ensure compliance with standards such as
GDPR
and
HIPAA
.
Must-Have Skills & Qualifications:
Education:
Bachelor's or Master's degree in Computer Science, Information Technology, Data Engineering, or a related field.
Experience:
Minimum 4 years of hands-on experience as a Data Engineer or in data-intensive environments.
SQL Expertise:
Advanced proficiency in SQL
for complex queries, joins, window functions, and performance tuning.
Analytical Databases:
Experience working with
Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse, and PostgreSQL
.
Query Optimization:
Skilled in query optimization, indexing, and execution plan analysis for high-performance analytics workloads.
Programming:
Proficient in
Python and PySpark
for data manipulation, automation, and pipeline orchestration.
Experience managing IAM roles, access control, and encryption in cloud environments.
Pipeline Optimization:
Skilled in optimizing data pipelines for performance, scalability, and cost-efficiency.
CI/CD and DevOps:
Hands-on experience with CI/CD tools such as GitHub Actions, GitLab CI, or Azure DevOps.
Version Control:
Proficient with Git and familiar with agile development practices.
Good-to-Have Skills:
Experience with
containerization and orchestration
.
Exposure to
data cataloging and governance tools
.
Experience with
monitoring tools
.
Familiarity with
data APIs and microservices architecture
.
Certification in cloud data engineering (e.g., AWS Certified Data Engineer, Azure Data Engineer Associate, or GCP Professional Data Engineer).
Experience supporting
machine learning and analytics pipelines
.
Soft Skills:
Strong analytical and problem-solving mindset.
Excellent communication and documentation skills.
Ability to work collaboratively in a cross-functional, fast-paced environment.
Strong attention to detail with a focus on data accuracy and reliability.
Eagerness to learn and adopt emerging data technologies.
Job Type:
Full Time
Job Location:
Kochi Trivandrum
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.