AI Evaluation Specialist - QA Job in Kore.ai

Ai Evaluation Specialist Qa

Year TS, IN, India

https://www.mncjobsindia.com/company/kore-ai

Apply Now

Job Description

Kore.ai is a pioneering force in enterprise AI transformation, empowering organizations through our comprehensive agentic AI platform. With innovative offerings across "AI for Service," "AI for Work," and "AI for Process," we're enabling over 400+ Global 2000 companies to fundamentally reimagine their operations, customer experiences and employee productivity.

Our end-to-end platform enables enterprises to build, deploy, manage, monitor, and continuously improve agentic applications at scale. We've automated over 1 billion interactions every year with voice and digital AI in customer service, and transformed employee experiences for tens of thousands of employees through productivity and AI-driven workflow automation.

Recognized as a leader by Gartner, Forrester, IDC, ISG, and Everest, Kore.ai has secured Series D funding of $150M, including strategic investment from NVIDIA to drive Enterprise AI innovation. Founded in 2014 and headquartered in Florida, we maintain a global presence with offices in India, UK, Germany, Korea, and Japan.

You can find full press coverage at https://kore.ai/press/

POSITION:

Senior AI Evaluation Specialist

POSITION SUMMARY:

We are seeking a

Senior AI Evaluation Specialist

to design and execute robust evaluation methodologies for Generative and Agentic AI systems. This role bridges

AI product quality, evaluation science, and responsible AI governance

-- ensuring every AI feature, agent, and model release is measured, benchmarked, and validated using standardised frameworks.

The ideal candidate combines a

QA mindset

ML evaluation rigour

, and

hands-on coding expertise

to benchmark LLMs, multi-agent workflows, and GenAI APIs, driving consistent, measurable, and safe AI product performance.

LOCATION: Hyderabad (Work from Office)

RESPONSIBILITIES:

1. AI Evaluation & Benchmarking

Build and maintain

end-to-end evaluation pipelines

for Generative and Agentic AI features (e.g., chat, reasoning agents, RAG workflows, summarization, classification).

Implement standardized evaluation frameworks such as

RAGAS, G-Eval, HELM, PromptBench, MT-Bench

, or

custom evaluation harnesses

.

Define and measure core

AI quality metrics

-- accuracy, groundedness, coherence, contextual recall, hallucination rate, and response time.

Create reproducible

benchmarks, leaderboards, and regression tracking

for models and agents across multiple releases or providers (OpenAI, Anthropic, Mistral, etc.).

2. Agentic AI Evaluation

Evaluate

multi-agent systems

and

autonomous AI workflows

, measuring task success rates, reasoning trace quality, and tool-use efficiency.

Assess

Agentic AI behaviors

such as planning accuracy, goal completion rate, context handoff success, and inter-agent communication reliability.

Validate

decision-making transparency

and

error recovery

mechanisms in autonomous agent frameworks (LangGraph, AutoGen, CrewAI, etc.).

Design agent-specific evaluation scenarios -- simulated environments, user-in-the-loop testing, and "mission-based" performance scoring.

3. Experimentation & Automation

Develop

Python-based evaluation scripts

to automate testing using OpenAI, Anthropic, and Hugging Face APIs.

Conduct large-scale comparative studies across

prompts, models, and fine-tuned variants

, analyzing quantitative and qualitative differences.

Integrate evaluations into CI/CD pipelines to enable

continuous AI quality monitoring

.

Visualize results using dashboards (Plotly, Streamlit, Dash, or Grafana).

4. Quality Governance & Reporting

Define and enforce

AI acceptance thresholds

before deployment.

Collaborate with

Responsible AI teams

to evaluate bias, fairness, safety, and privacy implications.

Produce detailed

evaluation reports and audit logs

for model releases and governance boards.

Present findings to Product, Data Science, and Executive stakeholders -- transforming metrics into actionable insights.

5. Collaboration & Continuous Improvement

Work closely with

Prompt Engineers, ML Scientists, and QA Engineers

to close the loop between testing and improvement.

Support Product teams in defining

evaluation-driven release criteria

.

Mentor junior evaluators in AI testing methodologies, benchmarking, and analysis.

Keep abreast of advances in

LLM evaluation research

Agentic AI frameworks

, and

tool-calling reliability testing

QUALIFICATIONS / SKILLS REQUIRED:

Expected Expertise

Programming

Python (Pandas, NumPy, LangChain, LangGraph, OpenAI/Anthropic SDKs)

Evaluation Frameworks

RAGAS, HELM, G-Eval, MT-Bench, PromptBench, custom scoring pipelines

GenAI APIs

OpenAI GPT-4/5, Claude, Gemini, Mistral, Azure OpenAI

Agentic AI

Understanding of multi-agent orchestration, tool use, reasoning traces, and planning frameworks (AutoGen, CrewAI, LangGraph)

Metrics Knowledge

BLEU, ROUGE, cosine similarity, factuality, coherence, bias, toxicity, reasoning success rate

Data & Analytics

JSON parsing, prompt dataset curation, result visualization

Tooling

Git, Jupyter/Colab, Jira, Confluence, evaluation dashboards

Soft Skills

Analytical communication, documentation excellence, cross-team collaboration

EDUCATION QUALIFICATION:

Bachelor's or Master's degree in Computer Science, AI, Data Science, or related discipline. 5 to 10 years total experience with at least 3+ years in

AI evaluation, GenAI QA, or LLM quality analysis

. Strong understanding of

AI/ML model lifecycle

prompt engineering

, and

RAG or agentic architectures

. * Experience contributing to AI safety, reliability, or responsible AI initiatives.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Job Detail

Job Id

JD5039904
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Permanent
Job Location

TS, IN, India
Education

Not mentioned
Experience

Year

MNC Jobs India

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers