who can build scalable and intelligent systems using
FastAPI
,
FAISS
, and
vector search technologies
. If you're passionate about retrieval-augmented generation (RAG), semantic search, and deploying real-world ML pipelines, we'd love to meet you!
Key Responsibilities:
Build, optimize, and maintain scalable ML APIs using
FastAPI
Implement and manage
vector-based search systems
using
FAISS
,
Pinecone
, or similar tools
Create and serve
embeddings
using transformer models for semantic search, recommendations, or chatbot systems
Design and deploy
RAG (retrieval-augmented generation)
pipelines using LLMs
Work with large unstructured datasets (text, documents, etc.) to extract features and build indexes
Collaborate with data scientists and backend engineers to integrate models into production
Monitor and improve model performance and API latency in production environments
Required Skills:
Strong experience with
Python
and
FastAPI
Hands-on with
FAISS
and
vector similarity search
concepts
Familiarity with
sentence-transformers
,
Hugging Face
, or
OpenAI embeddings
Working knowledge of
vector databases
like FAISS, Weaviate, Pinecone, or Qdrant
Experience with
NLP pipelines
,
tokenization
, and
text preprocessing
Comfortable with RESTful APIs, JSON, and basic cloud deployment (e.g., AWS/GCP)
Bonus/Good to Have:
Experience with
LangChain
,
LLMs
, or
RAG
architecture
MLOps practices for model serving and monitoring
Exposure to
Docker
,
Kubernetes
,
Airflow
, or
CI/CD pipelines
Prior experience deploying
LLM-based search or Q&A systems
Ideal Candidate Profile:
2-4 years of experience in AI/ML engineering or backend roles with ML focus
Comfortable building APIs that connect machine learning models to real users
Familiar with working in fast-paced, agile environments
Curious about LLMs, embeddings, and real-time AI applications
Job Type: Full-time