for our Deep Learning, LLM, and Vision-Language Model (VLM) products. You will define how we test, measure, document, and communicate AI quality--working closely with ML, Engineering, and Product teams in a fast-paced startup environment.
This role is ideal for someone who believes
clear documentation is as critical as good testing
, especially for non-deterministic AI systems.
What You'll Do
Own Quality & Documentation End-to-End
Define testing strategy for
LLMs, VLMs, and DL pipelines
.
Create and maintain
clear, lightweight documentation
covering:
+ Model testing strategies and assumptions
+ Evaluation metrics and acceptance criteria
+ Known limitations, risks, and failure modes
+ Release readiness and quality sign-off
Ensure documentation evolves with models, data, and prompts.
, regression test suites, and test result summaries.
Document prompt behaviour, edge cases, and known model quirks.
Vision & Multimodal Testing
Test VLMs for image-text alignment, OCR, captioning, and reasoning.
Document model performance across different image types, quality levels, and domains.
Track and publish
model behaviour changes
between versions.
Automation, MLOps & Reporting
Build Python-based automation for evaluation and regression testing.
Integrate tests into
CI/CD and MLOps pipelines
.
Produce
readable quality reports and dashboards
for engineers and leadership.
Monitor and document production issues such as
model/data drift and degradation
.
Build a Quality-First Culture
Establish QA and documentation standards that scale with a startup.
Mentor engineers on writing testable code and meaningful documentation.
Act as the
single source of truth
for AI quality, testing, and known risks.
What we're looking For
Must-Have
Strong background in
software testing with lead or ownership experience
.
Hands-on experience testing
LLMs, DL models, or GenAI systems
.
Strong
Python
skills for test automation and data validation.
Proven ability to write
clear, structured technical documentation
.
Understanding of:
+ Transformer-based models and DL workflows
+ Model evaluation metrics and non-deterministic system testing
Comfortable working in ambiguity and moving fast in a startup.
Nice-to-Have
Experience with
VLMs, multimodal models, or computer vision
.
Exposure to
RAG architectures
, vector databases, and embeddings.
Familiarity with tools like LangChain, LlamaIndex, MLflow, or similar.
Experience documenting AI risks, limitations, or compliance requirements.
Interested can apply to
nanda.k@telradsol.com
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.