Agentic Infrastructure Observability Engineer Job in XenonStack

Agentic Infrastructure Observability Engineer

Year PB, IN, India

XenonStack

50 Current Jobs Openings

Apply Now

Job Description

Job Information

Date Opened

08/08/2025

Job Type

Full time

Industry

Technology

Work Experience

3+ Years

City

Mohali

State/Province

Punjab

Country

India

Zip/Postal Code

160071

ABOUT XENONSTACK

--------------------

XenonStack is the fastest-growing data and AI foundry for agentic systems, enabling people and organizations to gain real-time and intelligent business insights.

Agentic Systems for AI Agents

: akira.ai

Vision AI Platform

: xenonstack.ai

Inference AI Infrastructure for Agentic Systems

: nexastack.ai

THE OPPORTUNITY

-------------------

We are seeking an

Agentic Infrastructure Observability Engineer

to design, implement, and maintain

visibility, monitoring, and assurance systems

for large-scale AI agent deployments.

This role focuses on

observability, telemetry, and evaluation pipelines

across multi-agent and multi-context workflows, ensuring AI systems are

measurable, trustworthy, and compliant

in enterprise and regulated environments.

If you're passionate about

SRE principles for AI

LLM evaluation

, and

agentic system transparency

, this role offers the chance to shape observability for the next generation of intelligent automation.

RESPONSIBILITIES

--------------------

Design and Implement Telemetry Pipelines

Build observability infrastructure to capture logs, metrics, traces, and behavioral data from AI agents, orchestration layers, and integrated tools.

Develop Evaluation Dashboards & KPIs

Track accuracy, latency, reliability, cost, token usage, and success rates for agentic workflows.

Enable Full-Stack Tracing

Build execution flow tracing for multi-agent, multi-tool pipelines, with attribution for each decision, prompt, and retrieval step.

Monitor Behavioral Reliability

Detect and flag hallucinations, decision drift, prompt degradation, or tool misuse in real time.

Integrate with Evaluation Frameworks

Work with LLM eval tools like

TruLens

Ragas

Arize AI

, and custom scoring systems for continuous quality monitoring.

Ensure Compliance & Auditability

Implement observability features for regulatory audits (e.g., PCI-DSS, GDPR), including secure logging of prompts, retrieved context, and decisions.

Cost & Resource Observability

Track model/API usage, compute cost, and token consumption to enable optimization decisions.

Collaborate Across Teams

Partner with AgentOps Engineers, AI Interaction Engineers, and Model Reliability teams to turn observability insights into operational improvements.

SKILLS & QUALIFICATIONS

----------------------------

Must-Have:

3-5 years in SRE, DevOps, AI infrastructure, or ML systems engineering. Proficiency in Python and observability stacks (Prometheus, OpenTelemetry, Grafana, ELK, etc.). Familiarity with

LLM architectures

multi-agent orchestration frameworks

(LangGraph, LangChain, AgentBridge), and context pipelines. Experience with logging, tracing, and performance profiling for distributed systems. Understanding of

LLM evaluation metrics

(factuality, coherence, toxicity, cost efficiency). Knowledge of

privacy and compliance standards

for AI systems.

Good-to-Have:

Hands-on experience with LLM eval tools (TruLens, Ragas, Arize AI, Weights & Biases). Familiarity with RAG, vector databases, and knowledge graph-based retrieval. Experience in regulated industries (BFSI, healthcare, GRC). Background in anomaly detection or behavioral monitoring for ML systems.

CAREER GROWTH & BENEFITS

-----------------------------

Continuous Learning & Growth

Training and certifications in AI observability, LLM evaluation, and Responsible AI. Hands-on exposure to

enterprise-scale agentic infrastructure

Recognition & Rewards

Incentives for innovations in AI observability and monitoring. Fast-track opportunities into

AI Reliability Architecture

Model Ops Leadership

roles.

Work Benefits & Well-Being

Comprehensive medical insurance and project-based allowances. Cab facilities for women employees and special project perks.

XENONSTACK CULTURE - JOIN US & MAKE AN IMPACT!

---------------------------------------------------

We foster a culture of cultivation with bold, human-centric leadership principles. We value

deep work

experimentation

, and

ownership

in every initiative, and we are on a mission to reshape how enterprises adopt AI + Human Intelligence systems.

Product Values:

Obsessed with Adoption

- Making AI accessible and enterprise-ready.

Obsessed with Simplicity

- Turning complexity into seamless, intuitive AI experiences.

Be a part of our vision to accelerate the world's transition to

AI + Human Intelligence

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Job Detail

Job Id

JD4019707
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Permanent
Job Location

PB, IN, India
Education

Not mentioned
Experience

Year

MNC Jobs India

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers