solutions. This role is ideal for someone who thrives at the intersection of
data engineering
,
AI systems
, and
business insights
--contributing to high-impact programs with clients.
Required Skills & Experience:
Advanced proficiency in
PySpark
,
Apache Spark
, and
Databricks
for batch and streaming data pipelines.
Strong experience with
SQL
for data analysis, transformation, and modeling.
Expertise in
data visualization
and
dashboarding tools
(Power BI, Tableau, Looker).
Solid understanding of
data warehouse design
,
relational databases
(PostgreSQL, Snowflake, SQL Server), and
data lakehouse
architectures.
Exposure to
Generative AI
,
RAG
,
embedding models
, and
vector databases
(e.g., FAISS, Pinecone, ChromaDB).
Experience with
Agentic AI frameworks
: LangChain, Haystack, CrewAI, or similar.
Familiarity with
cloud services
for data and AI (Azure, AWS, or GCP).
Excellent problem-solving and collaboration skills with an ability to bridge engineering and business needs.
Preferred Skills:
Experience with
MLflow
,
Delta Live Tables
, or other Databricks-native AI tools.
Understanding of
prompt engineering
,
LLM deployment
, and
multi-agent orchestration
.
Knowledge of
CI/CD
,
Git
,
Docker
, and DevOps pipelines.
Awareness of
Responsible AI
,
data privacy regulations
, and
enterprise data compliance
.
Background in consulting, enterprise analytics, or AI/ML product development.
Key Responsibilities:
Design, build, and optimize
distributed data pipelines
using
PySpark
,
Apache Spark
, and
Databricks
to support both analytics and AI workloads.
Support
RAG pipelines
,
embedding generation
, and
data pre-processing
for LLM applications.
Create and maintain
interactive dashboards
and
BI reports
using
Power BI
,
Tableau
, or
Looker
for business stakeholders and consultants.
Conduct
adhoc data analysis
to drive data-driven decision making and enable rapid insight generation.
Develop and maintain robust
data warehouse schemas
,
star/snowflake models
, and support
data lake architecture
.
Integrate with and support
LLM agent frameworks
such as
LangChain
,
LlamaIndex
,
Haystack
, or
CrewAI
for intelligent workflow automation.
Ensure data pipeline monitoring, cost optimization, and scalability in
cloud environments
(Azure/AWS/GCP).
Collaborate with cross-functional teams including AI scientists, analysts, and business teams to drive use-case delivery.
* Maintain strong
data governance
,
lineage
, and metadata management practices using tools like
Azure Purview
or
DataHub
.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.