Location: Chandigarh (On-site)
Experience: 5-7 years (AI/ML + DevOps + Observability)
Employment Type: Full-time
About the Role
We are looking for a highly skilled AIOps Engineer to design and implement AI-driven operational systems and intelligent observability platforms.
You will work across AIOps, MLOps, DevOps, SRE, and LLMOps, building self-healing infrastructure components, AI-based diagnostics, and autonomous remediation workflows. You'll contribute to architecture d
ecisions, build production systems, and collaborate with cross-functional engineering teams.
Key Responsibilities
Design and implement AIOps pipelines for telemetry ingestion, analytics, and alerting
Build AI-driven observability capabilities for anomaly detection and incident diagnostics
Develop ML/LLM workflows using Ray, PyTorch Lightning, vLLM, or SGLang
Implement automation for:
anomaly detection
event correlation
predictive maintenance
Build and enhance self-healing infrastructure and auto-remediation runbooks
Optimize model serving and LLM inference using vLLM, Ray Serve, Triton, Kubernetes
Implement real-time streaming pipelines using Kafka, Spark, or Flink
Integrate CI/CD for AI workflows with MLflow, Kubeflow, or Airflow
Work closely with SRE, platform, and AI engineering teams
Contribute to AIOps solution evaluation and PoCs for enterprise platforms
Participate in architecture discussions, design reviews, and performance optimization
Required Skills & Qualifications
Bachelor's or Master's in Computer Science, Engineering, or related field
5-7 years experience across DevOps, SRE, or AI/ML infrastructure
Strong programming skills in Python (preferred), Go, or Bash
Solid experience with Docker, Kubernetes, and public cloud (AWS/GCP/Azure)
Experience with Infrastructure-as-Code (Terraform, Helm, Pulumi)
Hands-on experience with distributed compute:
Ray
PyTorch Lightning
vLLM / SGLang
Strong knowledge of observability tools:
Prometheus, Grafana
ELK / OpenSearch
OpenTelemetry
Splunk / Datadog
Experience with MLOps / LLMOps tooling:
MLflow, Kubeflow, Airflow, Argo
Experience with messaging/streaming systems:
Kafka, RabbitMQ, AWS SQS
Understanding of AI-powered automation and root-cause analysis
Preferred / Nice to Have
Experience deploying vLLM, Triton, or Ray Serve in production
Exposure to agentic AI frameworks (LangGraph, AutoGen, CrewAI, LangChain)
Hands-on exposure to SGLang for LLM orchestration
Familiarity with vector databases (Milieus, Weaviate, Pine cone) and RAG-based observability
Experience with model monitoring, drift detection, and cost optimization
Contributions to open-source AIOps or observability projects
What We Offer
Opportunity to work on next-generation autonomous operations platforms
Hands-on exposure to Ray, vLLM, SGLang, Triton, PyTorch Lightning, LangGraph
Cross-functional collaboration across AI, cloud, and platform engineering
Competitive compensation and strong growth path toward AIOps Lead/Architect roles
Requirements
Key Responsibilities
Design and implement AIOps pipelines for telemetry ingestion, analytics, and alerting
Build AI-driven observability capabilities for anomaly detection and incident diagnostics
Develop ML/LLM workflows using Ray, PyTorch Lightning, vLLM, or SGLang
Implement automation for:
anomaly detection
event correlation
predictive maintenance
Build and enhance self-healing infrastructure and auto-remediation runbooks
Optimize model serving and LLM inference using vLLM, Ray Serve, Triton, Kubernetes
Implement real-time streaming pipelines using Kafka, Spark, or Flink
Integrate CI/CD for AI workflows with MLflow, Kubeflow, or Airflow
Work closely with SRE, platform, and AI engineering teams
Contribute to AIOps solution evaluation and PoCs for enterprise platforms
Participate in architecture discussions, design reviews, and performance optimizatio
Benefits
What We Offer
Opportunity to work on next-generation autonomous operations platforms
Hands-on exposure to Ray, vLLM, SGLang, Triton, PyTorch Lightning, LangGraph
Cross-functional collaboration across AI, cloud, and platform engineering
Competitive compensation and strong growth path toward AIOps Lead/Architect roles
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.