Industry: Software Development
Seniority level: Mid-Senior level
Min Experience: 10 years
Location: Bengaluru
JobType: full-time
We are building a next-generation
AI observability and trust platform
that enables enterprises to safely deploy, monitor, and improve AI systems at scale--across traditional ML models, LLMs, generative AI, and agentic workflows.
As a
Principal Platform Engineer
, you will architect and develop the backbone of this platform: high-performance backend services, distributed systems, and cloud-native infrastructure that power AI evaluation, monitoring, and reliability at production scale. This is a hands-on, high-ownership role where you will shape platform architecture, influence engineering standards, and help define what "trustworthy AI" looks like in real-world enterprise environments.
What You'll Own
Platform & Backend Architecture
Design and build scalable backend services that power AI observability, evaluation, and governance workflows.
Architect distributed systems capable of ingesting, processing, and querying high-volume AI telemetry and evaluation data.
Develop APIs and services that expose AI performance, reliability, and risk signals to enterprise customers.
Distributed Systems & Data Infrastructure
Build systems that compute and store advanced AI evaluation metrics such as accuracy, relevance, drift, latency, and hallucination indicators.
Design resilient data pipelines using event-driven and streaming architectures.
Optimize storage and query layers for scale, performance, and cost efficiency.
Reliability, Scale & Operations
Define and improve operational standards across availability, latency, SLOs, observability, and incident response.
Lead efforts around performance tuning, failure handling, capacity planning, and system resiliency.
Embed best practices for testing, CI/CD, and production readiness into platform development.
AI Platform Evolution
Partner with product, ML, and customer teams to design new evaluation capabilities aligned with emerging AI risks and enterprise needs.
Support observability for modern AI workloads including LLMs, GenAI pipelines, and agent-based systems.
Contribute to the long-term technical roadmap for responsible and transparent AI systems.
Technical Leadership
Act as a technical multiplier by reviewing designs and code, raising engineering standards, and guiding architectural decisions.
Mentor senior and mid-level engineers, helping them grow in systems thinking and execution.
Influence platform direction without formal people management responsibilities.
What We're Looking For
Core Experience
10+ years of professional experience building backend or platform systems in production environments.
Strong hands-on expertise in
Python
and backend service development.
Deep understanding of
distributed systems
, concurrency, fault tolerance, and performance optimization.
Experience designing APIs, microservices, and data-intensive systems.
Infrastructure & Cloud
Solid experience with cloud-native architectures on
AWS or GCP
.
Hands-on exposure to
Kubernetes
, containerized workloads, and modern CI/CD pipelines.
Experience with technologies such as
Postgres, Redis, Kafka, RabbitMQ, Ray
, or similar systems.
Familiarity with analytical data stores like
ClickHouse or Druid
is a strong plus.
Leadership & Ownership
Proven ability to work autonomously and drive complex initiatives from concept to production.
Strong problem decomposition and decision-making skills in ambiguous environments.
Excellent communication skills and comfort collaborating across distributed, cross-functional teams.
A mentorship-oriented mindset with a passion for building durable systems and strong engineering culture.
Bonus Points
Experience supporting ML, LLM, or GenAI systems in production.
Familiarity with modern LLM frameworks, evaluation tooling, or AI monitoring platforms.
Background in developer platforms, infra tooling, or internal platform teams.
Why This Role Stands Out
Work on a
category-defining AI platform
at the intersection of backend engineering and responsible AI.
High-impact, high-ownership role with architectural influence across the stack.
Exposure to cutting-edge AI workloads without requiring ML research background.
Opportunity to shape how enterprises build
trust, transparency, and reliability
into AI systems.
Key Skills
Backend Systems Platform Engineering Distributed Systems Python Cloud Infrastructure Kubernetes Kafka Postgres AI Observability System Design Reliability Engineering API Design Technical Leadership
We may use artificial intelligence (AI) tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses. These tools assist our recruitment team but do not replace human judgment. Final hiring decisions are ultimately made by humans. If you would like more information about how your data is processed, please contact us.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.