As an SRE SME , the engineer should ensure the reliability, availability, performance and scalability of the business applications. Close coordination with business, IT and Operations team and manage operational challenges.
Responsibilities:
Key responsibilities:
Reliability and uptime:
+ Ensure high availability and reliability of critical business systems by implementing best practices in monitoring, alerting and incident management. Automation
+ Develop and implement automation tools to eliminate manual processes and improve system efficiency. Incident Management.
+ Respond to system outages, troubleshoot issues and provide quick resolution to minimize the downtime and business impact. Monitoring and Observability
+ Design and manage observability solutions using tools like Prometheus, Grafana or Datadog to monitor system performance. Performance and Optimization.
+ Identify and fix bottlenecks in applications and infrastructure to improve overall system performance. Capacity planning.
+ Analyse system demands and ensure appropriate scaling to meet future requirements. Service Level Objectives:
+ Define, measure and monitor SLO and SLA's to maintain system health and reliability. Collaboration.
+ Partner with development and operations teams to integrate reliability into software lifecycle CI/CD pipelines.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.