We are looking for a Senior Site Reliability Engineer to lead reliability strategy, scaling initiatives, and architectural decisions. This role focuses on building highly available, resilient, and secure on-premise platforms while driving automation and operational excellence.
Key Responsibilities
Architect, scale, and optimize Kubernetes and on-premise infrastructure
Define and implement reliability standards, SLOs, SLIs, and incident management practices
Lead complex incident response, root cause analysis, and system recovery
Oversee performance tuning, high availability, and disaster recovery for databases and storage
Drive platform automation and standardization using Infrastructure as Code (IaC)
Mentor engineers and influence platform engineering best practices
Skills & Requirements
Deep expertise in Linux, Kubernetes, and distributed systems
Strong hands-on experience managing large-scale production environments
Advanced knowledge of MariaDB, Redis, and Ceph storage systems
Proven leadership in incident management and resilient system design
Strong automation-first, reliability-focused engineering mindset
Ability to guide architecture, tooling, and long-term operational strategy
Job Type: Full-time
Pay: From ?50,000.00 per month
Work Location: In person
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.