Skills SRE Mindset in Production support : Proactive issue identification using observability tools.
Skills in using different monitoring & observability tools to track system performance
Incident commander: Ability to diagnose complex issues and actively drive incident calls working with technical, product SMEs, and Tier 2 SREs.
Communication : Excellent communicator who could interact with Director/Sr. Director and above.
Understanding Concepts of
SLA , SLO, SLI.
Technical expertise
App Dynamics
,
New Relic, Splunk
(including Splunk APM and Splunk O11y), AppDynamics, Grafana, RedMetrics, 1000Eyes
Knowledge of VMs, Load balancers, Firewalls, API Gateways, DB, Network, Linux / Unix
Knowledge of Containerization, Docker, Kubernetes, AWS, PCF, GCP
ServiceNow (including AIOps, tools for Self-Heal and automated playbooks)
APM, NMON , Wireshark usage and analysis
Experience in UEM and synthetic monitoring tools
Responsibilities Production support activities including proactive identification of issues leveraging observability tools with the aim of reducing MTTD and MTTR
Coordinate all activities required to lead incident triage in compliance withSLAsand OLAs. Corelating inputs from various dashboards & tools to drive resolution.
Flexibility to work in 24 X 7 environment