Key Responsibilities
Toolchain Evaluation & Modernization
Evaluate legacy monitoring and alerting tools (e.g., BMC MainView, SolarWinds).
Recommend and integrate a unified observability stack using Splunk, Dynatrace, Grafana, and Elastic Stack.
Ensure end-to-end visibility across infrastructure, apps, and user experience.
AIOps Enablement
Deploy AIOps capabilities (event correlation, noise reduction, predictive analytics) using Dynatrace and Splunk.
Enable intelligent alerting and root cause analysis using ML-based models.
Integrate ServiceNow ITOM for automated incident creation and enrichment.
Automation & Self-Healing
Develop automation playbooks and runbooks (Python, PowerShell, Ansible) for common incident types.
Enable auto-remediation pipelines linked to AIOps events.
Support auto-scaling, service restarts, and config drift corrections.
Observability Architecture & Implementation
Deploy logs, metrics, traces using Elastic Stack and Dynatrace.
Define and implement Service Level Objectives (SLOs), error budgets, MTTR/MTTD benchmarks.
Build dashboards in Grafana, Dynatrace, and ServiceNow Performance Analytics.
Operational Process Reengineering
Redesign and automate event, incident, change, and problem management processes.
Align monitoring workflows with ServiceNow CMDB and CI health status.
Shift operations from reactive to proactive, leveraging predictive insights
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.