DS (Vector Search + GCP )- Bangalore
Bangalore
JOB DESCRIPTION
Data/Applied scientist (Search)
Strong in Python and experience with Jupyter notebooks, Python packages like
polars, pandas, numpy, scikit-learn, matplotlib, etc.
Must have: Experience with machine learning lifecycle, including data
preparation, training, evaluation, and deployment
Must have: Hands-on experience with GCP services for ML & data science
Must have: Experience with Vector Search , Hybrid Search techniques, Query preprocessing
Must have: Experience with embeddings generation using models like BERT, Sentence
Transformers, or custom models
Must have: Experience in embedding indexing and retrieval (e.g.,
Elastic, FAISS, ScaNN, Annoy)
Must have: Experience with LLMs and use cases like RAG (Retrieval-Augmented Generation)
Must have: Understanding of semantic vs lexical search paradigms
Must have: Experience with Learning to Rank (LTR) techniques and libraries (e.g., XGBoost,
LightGBM with LTR support)
Should be proficient in SQL and BigQuery for analytics and feature generation
Should have experience with Dataproc clusters for distributed data processing using Apache
Spark or PySpark
Should have experience deploying models and services using Vertex AI, Cloud Run, or Cloud
Functions
Should be comfortable working with BM25 ranking (via Elasticsearch or OpenSearch) and
blending with vector-based approaches
Good to have: Familiarity with Vertex AI Matching Engine for scalable vector retrieval
Good to have: Familiarity with TensorFlow Hub, Hugging Face, or other model repositories
Good to have: Experience with prompt engineering, context windowing, and embedding
optimization for LLM-based systems
Should understand how to build end-to-end ML pipelines for search and ranking applications
Must have: Awareness of evaluation metrics for search relevance
(e.g., precision@k, recall, nDCG, MRR)
Should have exposure to CI/CD pipelines and model versioning practices
GCP Tools Experience:
ML & AI: Vertex AI, Vertex AI Matching Engine, AutoML, AI Platform
Storage: BigQuery, Cloud Storage, Firestore
Ingestion: Pub/Sub, Cloud Functions, Cloud Run
Search: Vector Databases (e.g., Matching Engine, Qdrant on GKE), Elasticsearch/OpenSearch
Compute: Cloud Run, Cloud Functions, Vertex Pipelines, Cloud Dataproc (Spark/PySpark)
CI/CD & IaC: GitLab/GitHub Actions
EXPERTISE AND QUALIFICATIONS
Data/Applied scientist (Search)
Strong in Python and experience with Jupyter notebooks, Python packages like
polars, pandas, numpy, scikit-learn, matplotlib, etc.
Must have: Experience with machine learning lifecycle, including data
preparation, training, evaluation, and deployment
Must have: Hands-on experience with GCP services for ML & data science
Must have: Experience with Vector Search , Hybrid Search techniques, Query preprocessing
Must have: Experience with embeddings generation using models like BERT, Sentence
Transformers, or custom models
Must have: Experience in embedding indexing and retrieval (e.g.,
Elastic, FAISS, ScaNN, Annoy)
Must have: Experience with LLMs and use cases like RAG (Retrieval-Augmented Generation)
Must have: Understanding of semantic vs lexical search paradigms
Must have: Experience with Learning to Rank (LTR) techniques and libraries (e.g., XGBoost,
LightGBM with LTR support)
Should be proficient in SQL and BigQuery for analytics and feature generation
Should have experience with Dataproc clusters for distributed data processing using Apache
Spark or PySpark
Should have experience deploying models and services using Vertex AI, Cloud Run, or Cloud
Functions
Should be comfortable working with BM25 ranking (via Elasticsearch or OpenSearch) and
blending with vector-based approaches
Good to have: Familiarity with Vertex AI Matching Engine for scalable vector retrieval
Good to have: Familiarity with TensorFlow Hub, Hugging Face, or other model repositories
Good to have: Experience with prompt engineering, context windowing, and embedding
optimization for LLM-based systems
Should understand how to build end-to-end ML pipelines for search and ranking applications
Must have: Awareness of evaluation metrics for search relevance
(e.g., precision@k, recall, nDCG, MRR)
Should have exposure to CI/CD pipelines and model versioning practices
GCP Tools Experience:
ML & AI: Vertex AI, Vertex AI Matching Engine, AutoML, AI Platform
Storage: BigQuery, Cloud Storage, Firestore
Ingestion: Pub/Sub, Cloud Functions, Cloud Run
Search: Vector Databases (e.g., Matching Engine, Qdrant on GKE), Elasticsearch/OpenSearch
Compute: Cloud Run, Cloud Functions, Vertex Pipelines, Cloud Dataproc (Spark/PySpark)
CI/CD & IaC: GitLab/GitHub Actions
Job Type: Full-time
Pay: Up to ₹1,700,000.00 per year
Work Location: In person
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.