Agentic Infrastructure Observability Engineer

Year    PB, IN, India

Job Description

Job Information




Date Opened


08/08/2025

Job Type


Full time

Industry


Technology

Work Experience


3+ Years

City


Mohali

State/Province


Punjab

Country


India

Zip/Postal Code


160071







ABOUT XENONSTACK


--------------------



XenonStack is the fastest-growing data and AI foundry for agentic systems, enabling people and organizations to gain real-time and intelligent business insights.

Agentic Systems for AI Agents

: akira.ai

Vision AI Platform

: xenonstack.ai

Inference AI Infrastructure for Agentic Systems

: nexastack.ai

THE OPPORTUNITY


-------------------



We are seeking an

Agentic Infrastructure Observability Engineer

to design, implement, and maintain

visibility, monitoring, and assurance systems

for large-scale AI agent deployments.



This role focuses on

observability, telemetry, and evaluation pipelines

across multi-agent and multi-context workflows, ensuring AI systems are

measurable, trustworthy, and compliant

in enterprise and regulated environments.



If you're passionate about

SRE principles for AI

,

LLM evaluation

, and

agentic system transparency

, this role offers the chance to shape observability for the next generation of intelligent automation.

RESPONSIBILITIES


--------------------

Design and Implement Telemetry Pipelines



Build observability infrastructure to capture logs, metrics, traces, and behavioral data from AI agents, orchestration layers, and integrated tools.

Develop Evaluation Dashboards & KPIs



Track accuracy, latency, reliability, cost, token usage, and success rates for agentic workflows.

Enable Full-Stack Tracing



Build execution flow tracing for multi-agent, multi-tool pipelines, with attribution for each decision, prompt, and retrieval step.

Monitor Behavioral Reliability



Detect and flag hallucinations, decision drift, prompt degradation, or tool misuse in real time.

Integrate with Evaluation Frameworks



Work with LLM eval tools like

TruLens

,

Ragas

,

Arize AI

, and custom scoring systems for continuous quality monitoring.

Ensure Compliance & Auditability



Implement observability features for regulatory audits (e.g., PCI-DSS, GDPR), including secure logging of prompts, retrieved context, and decisions.

Cost & Resource Observability



Track model/API usage, compute cost, and token consumption to enable optimization decisions.

Collaborate Across Teams



Partner with AgentOps Engineers, AI Interaction Engineers, and Model Reliability teams to turn observability insights into operational improvements.

SKILLS & QUALIFICATIONS


----------------------------

Must-Have:



3-5 years in SRE, DevOps, AI infrastructure, or ML systems engineering. Proficiency in Python and observability stacks (Prometheus, OpenTelemetry, Grafana, ELK, etc.). Familiarity with

LLM architectures

,

multi-agent orchestration frameworks

(LangGraph, LangChain, AgentBridge), and context pipelines. Experience with logging, tracing, and performance profiling for distributed systems. Understanding of

LLM evaluation metrics

(factuality, coherence, toxicity, cost efficiency). Knowledge of

privacy and compliance standards

for AI systems.

Good-to-Have:



Hands-on experience with LLM eval tools (TruLens, Ragas, Arize AI, Weights & Biases). Familiarity with RAG, vector databases, and knowledge graph-based retrieval. Experience in regulated industries (BFSI, healthcare, GRC). Background in anomaly detection or behavioral monitoring for ML systems.

CAREER GROWTH & BENEFITS


-----------------------------

Continuous Learning & Growth



Training and certifications in AI observability, LLM evaluation, and Responsible AI. Hands-on exposure to

enterprise-scale agentic infrastructure

.

Recognition & Rewards



Incentives for innovations in AI observability and monitoring. Fast-track opportunities into

AI Reliability Architecture

or

Model Ops Leadership

roles.

Work Benefits & Well-Being



Comprehensive medical insurance and project-based allowances. Cab facilities for women employees and special project perks.

XENONSTACK CULTURE - JOIN US & MAKE AN IMPACT!


---------------------------------------------------



We foster a culture of cultivation with bold, human-centric leadership principles. We value

deep work

,

experimentation

, and

ownership

in every initiative, and we are on a mission to reshape how enterprises adopt AI + Human Intelligence systems.

Product Values:



Obsessed with Adoption

- Making AI accessible and enterprise-ready.

Obsessed with Simplicity

- Turning complexity into seamless, intuitive AI experiences.


Be a part of our vision to accelerate the world's transition to

AI + Human Intelligence

.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD4019707
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    PB, IN, India
  • Education
    Not mentioned
  • Experience
    Year