Senior Site Reliability Engineer

Year    Pune, Maharashtra, India

Job Description


Fluidra: Systems Reliability Engineer Location: Pune (Hybrid) Years of exp: 5+yrs We are looking for a Systems Reliability Engineer who will be working on critical services, your mission will be to ensure our services are fast, highly available, scalable, and able to withstand unprecedented increases in load. The Systems Reliability Engineer will be at the heart of solving production problems. The position requires the flexibility to take a holistic approach to troubleshooting and the ability to delve deeply into technical details. You will manage a team of SRE\'s to proactively ensure the stability, resilience and scale of our services by automation, testing and engineering. To build on expertise from Product teams\' systems/operations, cloud infrastructure (AWS), build and release engineering, software development and stress/load testing to make sure our services are available, cost optimized and fit for purpose early in the development lifecycle. Responsibilities To oversee the SRE team to ensure they are involved in every step of the application software development lifecycle, including product design, development, testing, and transition into operation. Provide coaching and mentoring to the SRE team to improve their skillset, increase knowledge and set the benchmark of quality and precision engineering. Work with technical roles across the department to drive evolution of the dev-ops toolchain, promoting improvements to streamline the software delivery process and showing improvements through metrics. Build innovative prototypes and lead development teams to develop quality solutions, by translating architectural designs into lower level implementation details, helping implement user stories if required. To take highly complex and manual processes and work to simplify and automate them. With a focus on agile methodologies, the SRE Lead will oversee the automation of runbooks, incident management, monitoring and remediation, automation of service requests and other activities that will aim to improve operational efficiency, observability , resiliency , performance efficiency for the platform Key Skills - Must Haves Must have a good troubleshooting and triaging skills to get to the root cause of the issue. Must have a good understanding on the principles of SRE when operationalizing large platforms. 2-3 years of hands-on development or programming experience in any of these tech stack: JavaScript / Typescript / Python / Java . Should be willing to pick up Typescript/Python for operational tasks. Good communication skills Must have 2 years of experience on any Infrastructure as a Code tool: CDK/CloudFormation/Terraform/Pulumi. Must be willing to learn CDK/Cloudformation. 3 years Hands on experience in AWS Cloud Must have good understanding on Networking for triaging issues. Expert knowledge in all aspects of designing, developing, managing large real-time systems. Project and process management. Prior successful experience as a site/systems reliability engineer. Demonstrated experience working in large, complex systems environments. A passion for performance excellence, robustness and engineering mindset Operational experience in maintaining applications. Must be willing to provide on-call support for Critical issues. Good To Have Experience working on serverless services and event-oriented architecture (AWS) is a plus AWS Lambda, DynamoDB, API Gateway , Kinesis, Experience with AWS IOT Core platform is a plus. Working on Automated runbooks using Systems Manager Runbooks, Rundeck or similar tools is a plus.

foundit

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD3179116
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Pune, Maharashtra, India
  • Education
    Not mentioned
  • Experience
    Year