Create software that will address availability, scalability, latency, and efficiency for Bookings' systems/services
Have a product-based mindset that takes both customer and future roadmap plans into account. Development efforts are focussed on solving for a general case in tech or related subsystem of responsibility while not ruling out that tooling or product can be leveraged by other teams
Technical Incident Management
Take ownership of how to procedurally deal with emergency situations. SRE should write the playbook on how to deal with a system/service degrading or even a full outage
Conduct post-mortem meetings (RFOs) to ensure learnings are applied and shared in case of incidents
Take part in our incident management program by participating in on-call rotation.
Be available to provide expertise and feedback for our service health program
Automation and Toil Reduction
Build automation and application orchestration to prevent recurrent problems and to reduce human labor
Observability (Monitoring and Alerting Improvements)
Implement monitoring and alerting. This might not always be writing the software itself but could also be to create the best practices around how to monitor and alert for a system/service
Engage in service capacity planning and demand forecasting, software performance analysis and system tuning
Architectural Guidance
Maintain holistic knowledge and understanding of a system/service instead of only knowing some fraction of the problem space
Create, document and implement Booking Reliability Engineering best practices.
Collaborate with other teams and tech POs to support them in building reliable and scalable systems/services for their users and stakeholders
Influence the business and tech colleagues to adapt engineering, reliability and security best practices
Community Involvement
Take an active part in educating and skilling up members of our engineering community
Requirements of special knowledge/skills
Proficiency in the core skills of a software developer: coding, large-scale software design & scaling, complexity analysis, algorithms, data structures, design patterns
Expertise in source control management such as Git, Bitbucket & Infrastructure provisioning with Terraform.
Solid hands-on experience with experience with configuration management tools ( Ansible & Puppet)
Deep understanding of Unix/Linux systems internals and networking; this includes topics like: kernel, shell and client-server protocols
Proficiency in Unix/Linux system administration (Redhat/CentOS)
Networking: significant knowledge and understanding of network theory, such as different protocols (TCP/IP, UDP, ICMP, etc), MAC addresses, IP packets, DNS, OSI layers, and load balancing)
Extensive on design, configuration and implementation for a system/service in a large scale production environment (systems engineering and architectural skills)
Expertise in various AWS services & their use cases. (EC2, Network, Lambda, IAM and more)
Eagerness to keep up with latest developments in technology
Connection with the worldwide SRE community
Exhibit the following behaviours: be curious; be data driven; have a systematic problem solving approach; constantly aiming to improve systems/services
Should have minimum 6 to maximum 10 years of experience in a similar role.
Architectural Guidance
+ Advise product teams towards a technical solution that meets the functional, nonfunctional & architectural requirements by challenging the rationale for an application design and providing context in the wider architectural landscape
+ Set a clear direction for a technical capability by evaluating and aligning the target architecture improvements, reframing architectural designs and decisions for varied stakeholder Critical Thinking
+ Find solutions to difficult or complex issues by applying different skills and techniques like analytical thinking, lateral thinking, and logical reasoning
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.