Designation: Site Reliability Architect Location: Turbhe Office, Mumbai CTC: as per company norms :
The Site Reliability Architect is a key leadership role, responsible for designing and implementing the architectural vision for our production systems, with a primary focus on reliability, scalability, and performance.
This individual will work closely with development, operations, and product teams to define and enforce SRE best practices, develop robust and resilient system designs, and drive the adoption of automation and observability across the organization.
Responsibilities:
Champion the architectural principles and long-term strategy for site reliability.
Design and review system architectures to ensure they meet high standards for reliability, scalability, and fault tolerance.
Enforce SRE principles such as Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.
Oversee the design and implementation of continuous integration/continuous deployment (CI/CD) pipelines.
Lead the response to major incidents, guiding teams through diagnosis and resolution.
Design and test disaster recovery and business continuity plans.
Collaborate with engineering teams to embed reliability into the software development lifecycle from the initial design phase.
Communicate complex technical concepts and reliability metrics to both technical and non-technical stakeholders.
Implement Chaos Engineering practices to proactively test system resilience.
Qualifications: M.Tech/B.Tech Or Equivalent Bachelor's Degree Min Experience: 10 years Max Experience: 16 years 10-16 years of experience in software engineering, systems administration, or a related role, with at least 5 years in a dedicated SRE or senior DevOps position.