Indicative years of total experience: 14 - 16 years
Location:
-------------
Pune/Hyderabad
Department:
---------------
Engineering / IT Operations
Reporting relationship:
This role will report to Program Manager
Job Type:
-------------
Full-Time (Hybrid)
Job Summary:
----------------
We are seeking a seasoned SRE Manager to lead our Observability & Reliability Engineering team, with a strong focus on IT Operations Management (ITOM) practices. This role will be responsible for driving end-to-end reliability, performance, and operational excellence across our infrastructure and applications. The ideal candidate will also oversee the ServiceNow ITOM module, ensuring seamless integration and automation of IT operations workflows.
Key Responsibilities:
-------------------------
Leadership & Strategy
Lead and mentor a team of SREs and Observability Engineers.
Define and drive the strategic roadmap for reliability, observability, and ITOM practices.
Collaborate with cross-functional teams (DevOps, Platform Engineering, Application Development, and ITSM) to align reliability goals with business objectives.
Observability & Monitoring
Own the observability stack including metrics, logs, traces, and dashboards.
Implement and manage tools like Prometheus, Grafana, ELK, Splunk, Datadog, or similar.
Drive proactive monitoring, alerting, and anomaly detection to reduce MTTR and improve system health.
Reliability Engineering
Champion SRE principles such as SLIs, SLOs, and error budgets.
Lead incident response and postmortem processes to ensure continuous improvement.
Automate operational tasks and improve system resilience through chaos engineering and fault injection.
ITOM Practice Management
Oversee the implementation and optimization of ServiceNow ITOM modules (Discovery, Event Management, Orchestration, CMDB).
Ensure accurate and up-to-date CMDB data to support incident, problem, and change management processes.
Drive automation of IT operations workflows using ServiceNow and other orchestration tools.
Process & Governance
Establish and enforce best practices for change management, incident management, and problem resolution.
Ensure compliance with internal and external audit requirements related to IT operations.
Stakeholder Engagement
Act as a key liaison between engineering, operations, and business stakeholders.
Provide regular updates and reports on system reliability, performance, and operational KPIs.
Qualifications:
Required Qualifications:
----------------------------
Bachelor's or Master's degree in Computer Science, Engineering, or related field.
10+ years of experience in IT operations, DevOps, or SRE roles.
3+ years in a leadership or managerial role.
Hands-on experience with observability tools and practices.
Strong expertise in ServiceNow ITOM modules and CMDB management.
Excellent communication, leadership, and stakeholder management skills.
Preferred Skills:
---------------------
Certifications in SRE, ServiceNow ITOM & cloud platforms (AWS, Azure, GCP).
Experience with infrastructure as code (Terraform, Ansible).
Familiarity with container orchestration (Kubernetes, Docker).
Knowledge of ITIL processes and frameworks.
Additional Information:
Required Behavioral Competency:
-----------------------------------
Make sound business decisions
Embrace Change
Build strong Partnership
Get results
Act Strategically
* Lead Cultivate Talent
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.