Llm Reliability & Evaluation Engineer

Year    PB, IN, India

Job Description

Job Information




Date Opened


08/08/2025

Job Type


Full time

Industry


Technology

Work Experience


1-3 years

City


Mohali

State/Province


Punjab

Country


India

Zip/Postal Code


160071




ABOUT XENONSTACK





XenonStack is the fastest-growing data and AI foundry for agentic systems, enabling people and organizations to gain real-time and intelligent business insights.

Agentic Systems for AI Agents:

akira.ai

Vision AI Platform:

xenonstack.ai

Inference AI Infrastructure for Agentic Systems:

nexastack.ai

THE OPPORTUNITY





We are looking for an

LLM Reliability & Evaluation Engineer

to design, implement, and maintain rigorous evaluation frameworks for Large Language Models (LLMs) powering our Agentic AI systems. This role will ensure models meet high standards of accuracy, safety, and performance across enterprise and regulated industry use cases.



If you're passionate about

model trustworthiness, benchmarking, and Responsible AI practices

, and want to shape how AI agents behave in mission-critical workflows, this is the role for you.

RESPONSIBILITIES



Design, implement, and maintain evaluation pipelines for LLM-based applications and agentic workflows. Define and track key performance indicators (accuracy, latency, cost, reliability) for deployed models. Develop automated test suites, benchmark datasets, and stress-testing scenarios for LLMs. Collaborate with data scientists, ML engineers, and product teams to integrate evaluation into the model lifecycle. Assess bias, fairness, and safety risks in LLM outputs and recommend mitigations. Validate model alignment to enterprise use case requirements and regulatory standards. Conduct A/B tests, prompt performance analysis, and long-context reliability checks. Document evaluation methodologies and maintain transparent reporting for internal and client use. Stay updated on state-of-the-art LLM evaluation techniques, frameworks, and metrics.

SKILLS & QUALIFICATIONS



Must-Have:



3-5 years of experience in AI/ML engineering, applied research, or QA for ML systems. Strong understanding of LLM architectures, prompt engineering, and agentic workflows. Proficiency in Python and ML frameworks (PyTorch, TensorFlow, Hugging Face). Experience with dataset curation, evaluation metrics (BLEU, ROUGE, BERTScore, factuality scores), and performance profiling. Familiarity with Responsible AI principles, fairness auditing, and bias detection methods.

Good-to-Have:



Experience with LangChain, LangGraph, or similar agent frameworks. Knowledge of model monitoring tools (Weights & Biases, MLflow, Arize AI, TruLens). Familiarity with multi-turn conversation evaluation and human-in-the-loop testing. Exposure to regulated industry AI governance (finance, healthcare, etc.).

CAREER GROWTH & BENEFITS



Continuous Learning & Growth



Certification sponsorships and advanced training in AI evaluation, safety, and optimization. Access to cutting-edge AI systems and enterprise-scale evaluation environments.

Recognition & Rewards



Performance incentives and awards for innovation in model reliability. Fast-track opportunities to AI Governance or Model Ops leadership roles.

Work Benefits & Well-Being



Comprehensive medical insurance and project-based allowances. Cab facilities for women employees and additional perks for special projects.

XENONSTACK CULTURE - JOIN US & MAKE AN IMPACT!





We foster a culture of cultivation with bold, human-centric leadership principles. We value

obsession

and

deep work

in every initiative, and we are on a mission to reshape how enterprises adopt AI + Human Intelligence systems.

Product Values:



Obsessed with Adoption

- Making AI accessible and enterprise-ready.

Obsessed with Simplicity

- Turning complexity into seamless, intuitive AI experiences.


Be a part of our vision to accelerate the world's transition to

AI + Human Intelligence

.

Requirements

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD4019715
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    PB, IN, India
  • Education
    Not mentioned
  • Experience
    Year