Principal Devops Engineer

Year    India, India

Job Description


How is this team contributing to the vision of Providence Healthcare Intelligence is the pillar that focuses on creating intelligent products within Providence and provides unique opportunities in Product Development, Product Design, Data Engineering, Operations, Data Science, BI Reporting and Data Analytics on cloud stack. We are a group of professionals who work towards enabling decisions that improve patient and caregiver experience. Now, as we face a new frontier-a changing health care landscape - we are looking for pioneering and compassionate individuals to plan for the next century and who can work on re-imagining the future of care with cutting-edge technologies such as big data, machine learning, artificial intelligence, IoT, and blockchain that enhance patient outcomes and experiences and more importantly, drive a lasting social impact. What will you be responsible for SRE Engineer in the capability of Healthcare Intelligence team, monitor, analyze & establish SLO/SLI & Operational metrics to maintain availability & reliability of applications enabling business continuity. SRE engineers are responsible for ensuring that the underlying infrastructure is running smoothly, and that systems and tools are working as expected. As a SRE Principal Engineer, you will Play a Lead role in SRE Engineering team and work with Product engineering team, DevOps & Operations Teams and play pivotal role in uptime for applications & services . Act as the 1st line of defense for Production Health of HI applications & Product services. Collaborate extensively with Product teams, discuss SRE practices, SLO/SLI & Project Roadmaps Design, Build, maintain tools and frameworks that support deployment automation, health-check of applications. Play vital role in driving regular credence with Product teams on SLI/SLO dashboards, Operational Metrics, Reliability & Availability. Own end-to-end availability and performance of key services and build automation to prevent problem reoccurrence. Automate response to all non-exceptional service conditions. Lead by example, mentor the team and establish credibility through quality technical execution. Manage on-call rotations across geo-locations, using a follow-the-sun model. Sound troubleshooting issues skills & participating in Severity issues & CODE RED calls. Create knowledge repository of Severity issues & best practices. Take initiatives to setup knowledge sharing sessions on awareness of different Products & respective functionalities. What would your day look like Monitor SRE Dashboards, highlight deviations in SLO/SLI, work with Incident Management Command center & Product teams to fix any potential issue. Play SME & a Quarterback role in Severity 1 & 2 issues and CODE RED situations. Guide product teams for faster resolution. Partner with Engineering team on measuring & improving Availability & reliability of Applications & Product services, Further monitor SRE dashboards & alerts and find opportunities to create more meaningful monitors for increasing reliability & availability for users. Create automation framework & pipelines for deploying monitors & build SRE dashboards which can be consume by various product teams. Track & deliver the assigned sprint items in a timely manner, with high quality. Who are we looking for Bachelor s/equivalent in Engineering 7+ years or Software development & Engineering experience with Large-scale enterprise. Good knowledge on Software Development Life Cycle. Technical knowledge: Strong Understanding on SRE operations methodology and implementation in Enterprise environment. Experience in multiple programming languages (java/j2ee/python/Nodejs/PowerShell) Strong knowledge on SQL & NoSQL. Proficiency in Kubernetes & Dockers, AKS & Helm charts. Understanding of Web Application Firewall & Azure Security policies Expert in monitoring & APM tools such as DataDog, Azure Log Analytics, Splunk, New relic, Nagios, Graphite, Grafana etc. Proficient with modern DevOps practices including CI/CD using Terraforms & ARM templates. High Proficiency on cloud technologies & infrastructure, preferably Azure. 2 years\' understanding of the Linux cgroups ecosystem and technologies surrounding it e.g., Docker, Mesos, Kubernetes 4 years\' experience with Systems, Network and Application security best practices, threat models, defensive security best practices, writing code to test systems and applications by employing techniques such as Penetration Testing and Chaos Engineering. Understand best practices for systems, networking, and application security e.g., understand how common injection attacks play out. Strong critical thinking skills, and the ability to think on your feet Ability to adapt quickly, and maintain a positive attitude Excellent verbal and written communication skills Ability to take ownership of issues Good collaborative skills to work with local and global teams.

foundit

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD3106405
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    India, India
  • Education
    Not mentioned
  • Experience
    Year