Deep understanding of Blameless Postmortems, Incident Management, Root Cause Analysis, Log Analysis Expert knowledge of Python Plotting tools and libraries Collaborate with cross-functional teams to implement and maintain observability solutions using Fiberplane to monitor and analyse complex, distributed systems. Lead and participate in blameless postmortem analysis of incidents, identifying root causes, and driving actionable recommendations for incident prevention and system improvement. Develop and maintain automation scripts and tools using Python to streamline operational tasks, deployments, and system management. Implement and manage monitoring, alerting, and logging solutions in a Linux-based environment to proactively detect and respond to performance bottlenecks and issues. Work closely with the SRE team to enhance system reliability, scalability, and resilience through architecture and infrastructure improvements.
foundit
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.