As a Monitoring SME & Architect, you will be responsible for designing, implementing a comprehensible Monitoring Solutions & process to ensure uptime, system health, performance & reliability. You will be responsible for reduction of alert volume, implement intelligible alerting, alert co-relations, compression of alerts, measuring signal to noise ratio and setting up an early warning system across Operations. You will be required to collaborate across teams and create centralized dashboarding and visibility to remove Silos. You will be responsible for architecting monitoring configurations in a scalable & secure model leveraging automation with a future scope of AI integrated Monitoring Operations.
Problem-solving skills - should be able to devise technical and creative solutions. Use Analytics to understand pattern and pro-actively identify gaps
Communication skills - Effective communication is key in this role to gather data about problems, prepare detailed notes and reports, and update users with further steps
Time management - Need to maintain excellent time management skills and should be able to set priorities when handling multiple cases.
Team collaboration - To routinely work with other functions to resolve user issues, so they need to successfully collaborate with team members and coworkers.
Highly motivated, hands-on personality.
Ability to learn quickly in a challenging environment
Key Accountability
Monitoring Effectiveness - Ensuring the monitoring framework and enhancements are setup to increase Pro-active identification & resolution prior to customer impact.
Setup & maintain centralized Monitoring Configuration by code
Consistently drive the alert volume down and eliminate false alerts
Setup advanced monitoring alerts for golden signals i.e. Latency, Errors, Throughputs & Saturation.
Transform from traditional CPU, Memory symptomatic monitors to more advanced alert co-relation pinpointing directly to issues for predictive monitoring
Create & implement Synthetic or End User Monitoring using Python, Selenium for customer experience monitoring
Set up API End point monitoring & measure uptime & availability across customers, products & infrastructure endpoints.
Implement SLOs, SLIs, Error Budgets concepts to measure & setup Maturity model
Maintain & Manage Code Repository built to scale and security measures
Leverage Automation to push changes on monitoring tools
Setup Orchestration mechanism for on-boarding & decommissioning to ensure Operational Readiness
Setup Dashboards & Create visibility across all Cross-functional teams
Establish Telemetry for automated collection of data across Metrics, Logs & Traces
Continuous Analysis on Data to acknowledge gaps and implementing improvements
Minimum Requirements
Associate's degree (or equivalent) in Computer Science; Information Technology or related field preferred
10-12 years of IT experience with 6 years of Monitoring Experience
Experience in Administrating Monitoring Tools - AppDynamics, SolarWinds, Grafana, Zabbix, DataDog, ELK Stack etc.
Hands-on experience on Logs, Metrics, Traces, Parsing, RegEx, Tagging
Hands-on experience on implementing APM, EUM, Synthetics, API endpoint etc.
Hands-on experience on integrations with ITSM tools such as Service Now & Jira
Hands-on experience on Ansible, Python, Selenium, Shell
Hands-on experience on Enterprise scale of Azure, VM Ware & AWS
Hands-on experience on creating dashboards and analysis
Excellent interpersonal, influencing skills, interacting appropriately with colleagues of many technical skill levels, remaining calm and courteous while working in a high-stress situation to resolve problems.
Problem-solving skills - should be able to devise technical and creative solutions. Use Analytics to understand pattern and pro-actively identify gaps
Communication skills - Effective communication is key in this role to gather data about problems, prepare detailed notes and reports, and update users with further steps
Time management - Need to maintain excellent time management skills and should be able to set priorities when handling multiple cases.
Team collaboration - To routinely work with other functions to resolve user issues, so they need to successfully collaborate with team members and coworkers.
Highly motivated, hands-on personality.
Ability to learn quickly in a challenging environment.
Our Values If you want to know the heart of a company, take a look at their values. Ours unite us. They are what drive our success - and the success of our customers. Does your heart beat like ours? Find out here: All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.
Job Detail
Job Id
JD3772667
Industry
Not mentioned
Total Positions
1
Job Type:
Full Time
Salary:
Not mentioned
Employment Status
Permanent
Job Location
Bangalore, Karnataka, India
Education
Not mentioned
Experience
Year
Apply For This Job
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.