Principal Site Reliability Engineer

Year    Gurgaon, Haryana, India

Job Description


Combine two of the fastest-growing fields on the planet with a culture of performance, collaboration and opportunity and this is what you get. Leading edge technology in an industry that's improving the lives of millions. Here, innovation isn't about another gadget, it's about making health care data available wherever and whenever people need it, safely and reliably. There's no room for error. Join us and start doing your life's best work.(sm) Primary Responsibilities:

  • Responsible for the end-user experience by leading the System Reliability Engineering (SRE) strategy for Optum Everycare product line
  • Build and lead the SRE function that owns application availability and performance. Build tools to lead through automation and proactive/predictive alerts by having robust data analytical toolset to identify improvement areas
  • Implement comprehensive service monitoring to ensure uptime and performance, including synthetic, real user, system, application performance, dashboards, etc.
  • Define, measure, and meet key Service Level Objectives, including availability, performance, incidents, and chronic problems
  • Drive strategy for end-to-end availability and performance of critical services and build automation to prevent problem recurrence. Eventually, an automated response to all non-exceptional service conditions
  • Partner with engineering and products teams to ensure a high-quality product is developed and released into production
  • Build a DevOps culture to provide high quality, continuous operations, and ongoing support, ensuring critical service-level metrics, customer requirements, and financial objectives
  • Develop and execute action plans to address performance and user issues
  • Lead cross-functional reliability events: deep application dives, Peak Season capacity management, and audits
  • Bring reliability investment ideas to be included in the technology roadmap
  • Comply with the terms and conditions of the employment contract, company policies and procedures, and any and all directives (such as, but not limited to, transfer and/or re-assignment to different work locations, change in teams and/or work shifts, policies in regards to flexibility of work benefits and/or work environment, alternative work arrangements, and other decisions that may arise due to the changing business environment). The Company may adopt, vary or rescind these policies and directives in its absolute discretion and without any limitation (implied or otherwise) on its ability to do so
Required Qualifications:
  • Undergraduate degree in applicable area of expertise or equivalent experience
  • 10+ years professional experience in supporting/leading enterprise infrastructure for large Fortune 500 enterprise
  • 5+ years supervisory & management experience, preferably large teams with global teams
  • Prior experience in managing large high-profile customer facing portfolio & transformed traditional reactive support model to predict/prevent support model leveraging service reliability engineering concepts
  • Demonstrated experience in communicating high-priority incidents, operating issues, incident/change summaries, and experience in managing vendors with services hosted on-prem and cloud
  • Knowledge of Devops tools, SRE, ITIL frame works for Service Delivery & service support processes
  • Excellent knowledge of ITIL frame works for Service Delivery & service support processes; Thorough understanding of ITIL-best practices & trends
  • Exposure to application monitoring tools like Dynatrace, Splunk, SiteScope, Grafana etc.
  • Ability to establish effective working relationships with executives (including C-level) and key business partners to drive a change agenda
  • Demonstrated ability to manage remote teams, fostering high collaboration between all team members in establishing a single, cohesive culture
  • Results oriented with a high level of emotional intelligence, strong communication and interpersonal skills, outstanding collaboration, strong influence, and consensus building
Preferred Qualifications:
  • Experience that demonstrates modernizing operations that resulted in reducing incidents and automating manual tasks through modern reliability engineering
  • Experience in migrating legacy infrastructure to cloud or extensive experience supporting services that are hosted in external cloud infrastructure
  • Demonstrated experience in Service Reliability Engineering and/or deep working experience with Grafana, Splunk, Dynatrace, Systrack/NexThink, Python, newRelic etc.
Careers with Optum. Here's the idea. We built an entire organization around one giant objective; make health care work better for everyone. So when it comes to how we use the world's large accumulation of health-related information, or guide health and lifestyle choices or manage pharmacy benefits for millions, our first goal is to leap beyond the status quo and uncover new ways to serve. Optum, part of the UnitedHealth Group family of businesses, brings together some of the greatest minds and most advanced ideas on where health care has to go in order to reach its fullest potential. For you, that means working on high performance teams against sophisticated challenges that matter. Optum, incredible ideas in one incredible company and a singular opportunity to do your life's best work.(sm)

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD2874265
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Gurgaon, Haryana, India
  • Education
    Not mentioned
  • Experience
    Year