SE- Data Engineering Job in CLOUDSUFI

Se Data Engineering

Year UP, IN, India

Apply Now

Job Description

About Us

CLOUDSUFI, a Google Cloud Premier Partner, is a global leading provider of data-driven digital transformation across cloud-based enterprises. With a global presence and focus on Software & Platforms, Life sciences and Healthcare, Retail, CPG, financial services and supply chain, CLOUDSUFI is positioned to meet customers where they are in their data monetization journey.

Our Values

We are a passionate and empathetic team that prioritizes human values. Our purpose is to elevate the quality of lives for our family, customers, partners and the community.

Equal Opportunity Statement

CLOUDSUFI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified candidates receive consideration for employment without regard to race, colour, religion, gender, gender identity or expression, sexual orientation and national origin status. We provide equal opportunities in employment, advancement, and all other areas of our workplace. Please explore more at https://www.cloudsufi.com/

Location:

Noida, Uttar Pradesh, India (Hybrid)

Job Summary

We are seeking a highly skilled and motivated Data Engineer to join our Development POD for the Integration Project. The ideal candidate will be responsible for designing, building, and maintaining robust data pipelines to ingest, clean, transform, and integrate diverse public datasets into our knowledge graph. This role requires a strong understanding of Cloud Platform (GCP) services, data engineering best practices, and a commitment to data quality and scalability.

Key Responsibilities

ETL Development:

Design, develop, and optimize data ingestion, cleaning, and transformation pipelines for various data sources (e.g., CSV, API, XLS, JSON, SDMX) using Cloud Platform services (Cloud Run, Dataflow) and Python.

Data Modelling & Storage Design:

Data modelling and architecture design for storing structured, semi-structured, and unstructured data across databases and GCS. Design highly scalable schemas to support taxon

omy, metadata, transactional data, and hierarchical relationships

. Ensure models support efficient querying, versioning, extensibility, and downstream analytics use cases

Database & SQL Expertise: Demonstrate

strong expertise in relational databases (e.g.,

PostgreSQL

), including: Writing complex SQL queries, joins, and subqueries Designing tables, indexes, keys, and constraints Developing stored procedures, functions, and views Optimize database performance and ensure data consistency and integrity.

Data Hunting & Ingestion:

Proactively identify, evaluate, and hunt high-quality data sources for specific technologies, domains, and business use cases. Build automation to ingest newly discovered datasets into the data corpus with minimal manual effort. Leverage

LLM APIs

to handle unknown schemas, unstructured inputs, and edge cases in data extraction and ingestion.

Data Validation & Quality Assurance:

Implement comprehensive data validation and quality checks (statistical, schema, anomaly detection, consistency) to ensure data integrity, accuracy, and freshness. Troubleshoot and resolve data quality errors.

Knowledge Graph Integration:

Integrate transformed data into the Knowledge Graph, ensuring proper versioning and adherence to existing standards.

Collaboration:

Work closely with cross-functional teams and relevant stakeholders.

Qualifications and Skills

Education:

Bachelor's or Master's degree in Computer Science, Data Engineering, Information Technology, or a related quantitative field.

Experience:

3+ years of proven experience as a Data Engineer, with a strong portfolio of successfully implemented data pipelines.

Programming Languages:

Proficiency in Python for data manipulation, scripting, and pipeline development.

Cloud Platforms and Tools:

Expertise in Google Cloud Platform (GCP) services, including Cloud Storage, Cloud SQL, Cloud Run, Dataflow, Pub/Sub, BigQuery, and Apigee. Proficiency with Git-based version control.

Core Competencies:

Solid understanding of data modelling, schema design, ETL, Python and knowledge graph concepts (e.g., Schema.org, RDF, SPARQL, JSON-LD). Experience with data validation techniques and tools. Familiarity with CI/CD practices and the ability to work in an Agile framework. Strong problem-solving skills and keen attention to detail.

Preferred Qualifications:

Experience with LLM-based tools or concepts for data automation (e.g., auto-schematization). Familiarity with similar large-scale public dataset integration initiatives. * Experience with multilingual data integration.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Job Detail

Job Id

JD5067697
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Permanent
Job Location

UP, IN, India
Education

Not mentioned
Experience

Year

MNC Jobs India

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers