with strong experience in building scalable data pipelines and working with modern cloud-based data ecosystems. The ideal candidate will have hands-on experience with
Databricks, Apache Spark, and Google Cloud Platform (GCP)
, especially
BigQuery
, and a passion for driving data initiatives that power intelligent decision-making across the organization.
Key Responsibilities:
Design, build, and optimize large-scale, reliable data pipelines using
Databricks
,
GCP (BigQuery)
, and other modern tools.
Perform advanced
SQL
querying,
data wrangling
, and complex
data transformations
to support analytics and machine learning initiatives.
Handle structured and semi-structured data, and apply
Exploratory Data Analysis (EDA)
techniques to derive insights.
Work closely with
data scientists
to implement and deploy
data models
and pipelines into
production
environments.
Ensure
data quality, reliability, lineage
, and
security
across the entire data pipeline lifecycle.
Participate in
data architecture
discussions and influence decisions around
data design and storage strategy
.
Contribute to data democratization by ensuring business users have access to clean and usable data.
Create detailed documentation and reusable frameworks for data ingestion, transformation, and operational workflows.
Required Skills & Qualifications:
3-6+ years
of experience in a Data Engineering role or similar.
Strong expertise in
Databricks
and
Apache Spark
.
Deep hands-on experience with
GCP BigQuery
- including performance tuning, partitioning, and optimization.
Proficiency in
advanced SQL
- including complex joins, CTEs, window functions, and query optimization.
Solid experience with
Python
for data manipulation and developing robust pipelines.
Familiarity with
data science concepts
- such as feature engineering, basic model implementation, and evaluation metrics.
Knowledge of
data profiling
,
EDA
, and
statistical analysis
.
Sound understanding of
data structures
,
normalization/denormalization
, and
metadata management
.
Demonstrated understanding of how data impacts business decisions and product development.
Strong
problem-solving
, communication, and collaboration skills.
Education:
Bachelor's degree in
Computer Science
,
Information Systems
,
Engineering
,
Computer Applications
, or a related technical discipline.
Preferred Qualifications (Nice to Have):
Exposure to modern data orchestration tools (e.g., Airflow, dbt).
Experience working in Agile environments and cross-functional teams.
For more information or to apply, contact us at: