Senior Site Reliability Engineer I

Year    KA, IN, India

Job Description

Role Description:

Senior Site Reliability Engineer





At Booking.com, our mission is to make it easier for everyone to experience the world. And while that world might feel a little farther away right now, we're busy preparing for when the world is ready to travel once more. With strategic long-term investments into what we believe the future of travel can be, we are opening career opportunities that will have a strong impact on our mission.



The core premise for SRE lies in treating operations as a software problem where operations are concerned with addressing availability, scalability, latency and efficiency for Booking.com's systems & services. At its core the SRE is tasked with engineering efforts to solve complex problems, requiring a strong aptitude to develop software systems that will minimize (i.e. through automation) human labor and increase system & service reliability.


A Booking Reliability Engineering team has full vertical ownership of a system, from the server configuration up to the application interfaces. This enables the team to have full control on a service, and avoid situations where different teams own different areas of a system and some parts fall between the cracks.


SRE can wear several hats; at times an SRE might be part of the product development effort himself and other times will act as a consultant to support and advise a product development team to implement the Booking Reliability Engineering best practices. As systems & services grow in size and complexity so too does the operational overhead. It is a fundamental principle of SRE to break this relationship between operational toil, system size and complexity. This also requires the team to limit operations work enforcing engineering development efforts that is at the heart of Booking Reliability Engineering.


Ultimately the fundamental software engineering skills coupled with strong systems and networking knowledge will guide the SRE to create more reliable systems & services that are highly available, which scales with growth and that is efficient and latency sensitive.


An SRE has the additional responsibilities of fostering an active and thriving SRE community, leading the community by example of being an advocate of engineering, reliability and security best practices.

B.Responsible.



Systems Design (SAP)



Create and evolve SAP solutions that ensure availability, scalability, latency, and efficiency across Booking.com's SAP landscape--including core applications (e.g., S/4HANA on HANA), integration tiers (e.g., SAP BTP, CPI/PO), and application interfaces (e.g., OData, RFC, IDoc)--with robust monitoring, capacity planning, and performance tuning baked in. Operate with a product mindset in the SAP domain, balancing current customer outcomes with the future roadmap; design for generalizable patterns across SAP modules and integration layers (e.g., reusable CPI packages, ABAP frameworks, Fiori components, shared observability/tooling) so solutions can be leveraged by other teams.

Technical Incident Management



Take ownership of how to procedurally deal with emergency situations. SRE should write the playbook on how to deal with a system/service degrading or even a full outage Conduct post-mortem meetings (RFOs) to ensure learnings are applied and shared in case of incidents Take part in our incident management program by participating in on-call rotation. Be available to provide expertise and feedback for our service health program

Automation and Toil Reduction



Build automation and application orchestration to prevent recurrent problems and to reduce human labor Strategise and implement IT DR for Critical Applications (SAP Prefered)

Observability (Monitoring and Alerting Improvements)



Implement monitoring and alerting. This might not always be writing the software itself but could also be to create the best practices around how to monitor and alert for a system/service Engage in service capacity planning and demand forecasting, software performance analysis and system tuning

Architectural Guidance



Maintain holistic knowledge and understanding of a system/service instead of only knowing some fraction of the problem space Create, document and implement Booking Reliability Engineering best practices. Collaborate with other teams and tech POs to support them in building reliable and scalable systems/services for their users and stakeholders Influence the business and tech colleagues to adapt engineering, reliability and security best practices

Community Involvement



Take an active part in educating and skilling up members of our engineering community

B.Skilled.



Bachelor or Master degree Around 6 - 10 years of experience in a similar role Technical knowledge and skills

+ SAP Application Lifecycle: Oversee the full lifecycle of SAP applications: requirements, design, build, test, deployment, and operations.
+ Expertise in source control management such as Git, Bitbucket & Infrastructure provisioning with Terraform.
+ Solid hands-on experience with experience with configuration management tools ( Ansible & Puppet)
+ Deep understanding of Unix/Linux systems internals and networking; this includes topics like: kernel, shell and client-server protocols
+ Proficiency in Unix/Linux system administration (Redhat/CentOS)
+ Networking: significant knowledge and understanding of network theory, such as different protocols (TCP/IP, UDP, ICMP, etc), MAC addresses, IP packets, DNS, OSI layers, and load balancing)
+ Extensive on design, configuration and implementation for a system/service in a large scale production environment (systems engineering and architectural skills)
+ Expertise in various AWS services & their use cases. (EC2, Network, Lambda, IAM and more)
+ Eagerness to keep up with latest developments in technology
+ Connection with the worldwide SRE community
+ Exhibit the following behaviours: be curious; be data driven; have a systematic problem solving approach; constantly aiming to improve systems/services

Architectural Guidance



+ Advise product teams towards a technical solution that meets the functional, nonfunctional & architectural requirements by challenging the rationale for an application design and providing context in the wider architectural landscape
+ Set a clear direction for a technical capability by evaluating and aligning the target architecture improvements, reframing architectural designs and decisions for varied stakeholder

Critical Thinking



+ Find solutions to difficult or complex issues by applying different skills and techniques like analytical thinking, lateral thinking, and logical reasoning
+ Constructively improve existing ideas, plans and solutions by reviewing them in a critical yet constructive manner, initiating concrete improvements and articulating their rationale
+ Continuous Quality and Process Improvement
+ Identify opportunities for process, system and/or structural improvements, by applying an understanding of process flows and the methods that can be used to boost effectiveness and efficiency

End to End System Ownership



+ Own a service end to end by actively monitoring application health and performance, setting and monitoring relevant metrics and act accordingly when violated and guide more junior members of the team in this topic.
+ Reduce business continuity risks and bus factor by applying state-of-the-art practices and tools, and writing the appropriate documentation such as runbooks and OpDocs and guide more junior members of the team in this topic.
+ Reduce risk and obtain customer feedback by using continuous delivery and experimentation frameworks and guide more junior members of the team in this topic.
+ Independently manage an application or service by working through deployment and operations in production and guide more junior members of the team in this topic.

Effective Communication



+ Deliver clear, well-structured, and meaningful information to a target audience by using suitable communication mediums and language tailored to the audience
+ Achieve mutually agreeable solutions by staying adaptable, communicating ideas in clear coherent language and practising active listening.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD5124230
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    KA, IN, India
  • Education
    Not mentioned
  • Experience
    Year