Architect Site Reliability Engineering

Year    Hyderabad, Telangana, India

Job Description


About Inspire Brands Hyderabad Support Center

Inspire Brands is disrupting the restaurant industry through digital transformation and operational efficiencies. The company\xe2\x80\x99s technology hub, Inspire Brands Hyderabad Support Center, India, will lead technology innovation and product development for the organization and its portfolio of distinct brands. The Inspire Brands Hyderabad Support Center will focus on developing new capabilities in data science, data analytics, eCommerce, automation, cloud computing, and information security to accelerate the company\xe2\x80\x99s business strategy. Inspire Brands Hyderabad Support Center will also host an innovation lab and collaborate with start-ups to develop solutions for productivity optimization, workforce management, loyalty management, payments systems, and more.



The Architect Site Reliability Engineering provides technical leadership in support of Inspires initiatives in cloud computing with a focus on improving efficiency, reducing toil, and increasing uptime and availability of Inspires cloud platforms. This individual will collaborate with peers to shape cloud application and infrastructure design, mature production readiness reviews, enhance build/test/release automation, mature observability practices and approach, and enhance platform resiliency, scalability, and recovery capabilities. The successful candidate will be comfortable engaging a wide variety of technical partners and stakeholders, takes a data-driven and analytical approach to problem resolution and identifying areas of opportunity, is self-driven, and has a passion for continuous improvement.

Primary Responsibilities and Essential Functions:

  • Engage in and strengthen application and cloud services development lifecyclefrom inception, design, deployment, operation, to refinement. Work closely with application and platform teams to ensure software releases are well designed, planned, implemented, released, and monitored.
  • Design, motivate, guide, and support the creation of software, systems, and processes to increase product reliability and organizational efficiency while optimizing resource use and cloud spend.
  • Champion and support reliability practices across the software development lifecycle through activities like architecture reviews, code reviews, creating platforms and frameworks, and capacity planning.
  • Work with senior engineering and testing team members to build tools and recommend testing strategies for problem prevention, detection, and chaos testing.
  • Mature SRE practices through activities such as establishing error budgets, providing guidance and refinement to SRE dashboards, and enhancing capabilities to proactively detect anomalies.
  • Provide design guidance and recommendations for platform improvements based on production incident analysis and root cause investigation outputs.
  • Improve service reliability through blameless post-incident reviews and use of code, automation, or AI to respond to or prevent future problem recurrence.
  • Recognize automation opportunities, provide design, and support implementation / development of tools to automate routine, time-consuming, or manual jobs and processes.
  • Periodically assess current SRE practices and tools and provide recommendations for enhancements and improvement
  • Train, guide, and mentor teammates on SRE practices and principles
  • Design and execute strategies that ensure the scalability and the elasticity of the infrastructure.
  • Code-level debugging on issues escalated to the team.
Minimum Experience:
  • Minimum 8 years of experience as platform architect with advanced knowledge in the following key areas: containers, deployment architecture, benchmarking, design, and network engineering.
  • Minimum 4 years of combined experience serving in either a DevOps, SRE, Systems, and/or software development role.
  • Hands-on experience in establishing and maturing SRE practices, program, and roadmap
  • Extensive experience with public cloud technologies and cloud-native architectures and solutions. (Azure highly preferred)
  • Experience with Infrastructure-as-Code (IAC), DevOps, and CI/CD practices and tool chains (Terraform, Gitlab, ArgoCD, Jenkins)
  • Experience with configuration management tools (Ansible, Chef, and Packer)
  • Experience with container technology and orchestration (Kubernetes, Docker)
  • Experience with Observability and Monitoring practices and tools (OpenTelemetry, New Relic, OpsRamp, Prometheus, Grafana, Elastic Stack, Splunk, DynaTrace)
  • Deep understanding of microservice architectures, application servers, network, and databases
  • Excellent understanding of scalability processes and techniques
  • Hands-on experience designing and administering high availability and high-performance environments, as well as managing large-scale deployments of traffic-heavy applications.
  • Ability to understand and support multiple, complex systems and not shy away from the challenge of improving them.
  • Comfortable with technical refactoring and creating technical designs to accommodate architectural evolution over time.
  • The willingness to try new technologies and make them harmonize with existing systems to achieve better operations overall.
  • Excellent communication and collaboration skills.

Inspire Brands Hyderabad Support Center

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD3248120
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Hyderabad, Telangana, India
  • Education
    Not mentioned
  • Experience
    Year