Data Scientist

Year    KA, IN, India

Job Description

DS (Vector Search + GCP )- Bangalore

Bangalore

JOB DESCRIPTION

Data/Applied scientist (Search)

Strong in Python and experience with Jupyter notebooks, Python packages like
polars, pandas, numpy, scikit-learn, matplotlib, etc.

Must have: Experience with machine learning lifecycle, including data
preparation, training, evaluation, and deployment

Must have: Hands-on experience with GCP services for ML & data science Must have: Experience with Vector Search , Hybrid Search techniques, Query preprocessing Must have: Experience with embeddings generation using models like BERT, Sentence
Transformers, or custom models

Must have: Experience in embedding indexing and retrieval (e.g.,
Elastic, FAISS, ScaNN, Annoy)

Must have: Experience with LLMs and use cases like RAG (Retrieval-Augmented Generation) Must have: Understanding of semantic vs lexical search paradigms Must have: Experience with Learning to Rank (LTR) techniques and libraries (e.g., XGBoost,
LightGBM with LTR support)

Should be proficient in SQL and BigQuery for analytics and feature generation Should have experience with Dataproc clusters for distributed data processing using Apache
Spark or PySpark

Should have experience deploying models and services using Vertex AI, Cloud Run, or Cloud
Functions

Should be comfortable working with BM25 ranking (via Elasticsearch or OpenSearch) and
blending with vector-based approaches

Good to have: Familiarity with Vertex AI Matching Engine for scalable vector retrieval Good to have: Familiarity with TensorFlow Hub, Hugging Face, or other model repositories Good to have: Experience with prompt engineering, context windowing, and embedding
optimization for LLM-based systems

Should understand how to build end-to-end ML pipelines for search and ranking applications Must have: Awareness of evaluation metrics for search relevance
(e.g., precision@k, recall, nDCG, MRR)

Should have exposure to CI/CD pipelines and model versioning practices
GCP Tools Experience:

ML & AI: Vertex AI, Vertex AI Matching Engine, AutoML, AI Platform

Storage: BigQuery, Cloud Storage, Firestore

Ingestion: Pub/Sub, Cloud Functions, Cloud Run

Search: Vector Databases (e.g., Matching Engine, Qdrant on GKE), Elasticsearch/OpenSearch

Compute: Cloud Run, Cloud Functions, Vertex Pipelines, Cloud Dataproc (Spark/PySpark)

CI/CD & IaC: GitLab/GitHub Actions

EXPERTISE AND QUALIFICATIONS

Data/Applied scientist (Search)

Strong in Python and experience with Jupyter notebooks, Python packages like
polars, pandas, numpy, scikit-learn, matplotlib, etc.

Must have: Experience with machine learning lifecycle, including data
preparation, training, evaluation, and deployment

Must have: Hands-on experience with GCP services for ML & data science Must have: Experience with Vector Search , Hybrid Search techniques, Query preprocessing Must have: Experience with embeddings generation using models like BERT, Sentence
Transformers, or custom models

Must have: Experience in embedding indexing and retrieval (e.g.,
Elastic, FAISS, ScaNN, Annoy)

Must have: Experience with LLMs and use cases like RAG (Retrieval-Augmented Generation) Must have: Understanding of semantic vs lexical search paradigms Must have: Experience with Learning to Rank (LTR) techniques and libraries (e.g., XGBoost,
LightGBM with LTR support)

Should be proficient in SQL and BigQuery for analytics and feature generation Should have experience with Dataproc clusters for distributed data processing using Apache
Spark or PySpark

Should have experience deploying models and services using Vertex AI, Cloud Run, or Cloud
Functions

Should be comfortable working with BM25 ranking (via Elasticsearch or OpenSearch) and
blending with vector-based approaches

Good to have: Familiarity with Vertex AI Matching Engine for scalable vector retrieval Good to have: Familiarity with TensorFlow Hub, Hugging Face, or other model repositories Good to have: Experience with prompt engineering, context windowing, and embedding
optimization for LLM-based systems

Should understand how to build end-to-end ML pipelines for search and ranking applications Must have: Awareness of evaluation metrics for search relevance
(e.g., precision@k, recall, nDCG, MRR)

Should have exposure to CI/CD pipelines and model versioning practices
GCP Tools Experience:

ML & AI: Vertex AI, Vertex AI Matching Engine, AutoML, AI Platform

Storage: BigQuery, Cloud Storage, Firestore

Ingestion: Pub/Sub, Cloud Functions, Cloud Run

Search: Vector Databases (e.g., Matching Engine, Qdrant on GKE), Elasticsearch/OpenSearch

Compute: Cloud Run, Cloud Functions, Vertex Pipelines, Cloud Dataproc (Spark/PySpark)

CI/CD & IaC: GitLab/GitHub Actions

Job Type: Full-time

Pay: Up to ₹1,700,000.00 per year

Work Location: In person

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD4225828
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    KA, IN, India
  • Education
    Not mentioned
  • Experience
    Year