Assistant Vice President Site Reliability Engineering Lead

Year    Thane, Maharashtra, India

Job Description


A Position Overview Position Title SRE Lead Department IT Level / Band AVP/VP Role Summary: Lead SRE practice. Setup process to govern KPI monitoring for reliable and stable Organization IT operation. Designing and Implementing Systems for reliability, availability, scalability of the services. Developing and Implementing Incident Response Process to ensure quick response. Partnering with development teams for applications and services are designed and developed with reliability in mind. Continuously improving systems to ensure optimal performance, efficiency and reliability. Maintain documentation to ensure that processes and procedures are well documented and communicated to teh team Perform competitive analysis for applications availability with minimal downtime. The role needs the candidate to understand Insurance domain, mentor the team and collaborate with vendor IT teams to ensure architecture alignment, support project managers, integration design, design reviews for proactive monitoring, fail safe, performant and secure, and impact of downtime of any application in any environment. Support project engagement, product enhancements and new enhancements by understanding business requirements, and performing competitive analysis for of the development of integrations. As a SRE Lead is fully responsible for availability of platform tools and their management that helps in speedup the delivery process like architectural runway, deployment plans and maintaining high availability, scalability, and fault tolerance. Also manages strategic solutions for disaster recovery and helps in finding recovery point objective or recovery time objectives. This happens with platform and infrastructure components, both On-premises and Cloud, according to the IT industry frameworks and best practices, by that you achieve high availability of the platform tools and other applications that generate value for TALIC and allow us to follow and support in our long-term IT strategy Lead a team of SREs in an organization and ensure the reliability, availability, and scalability of the organization-s applications and services. Possess foundational working knowledge of IT industry best practices and Well-Architected frameworks for application lifecycle management. Improving the overall service business by identifying internal process improvements, training, and opportunities for continuous deployment, upgrade and monitor application. B Organizational Relationships Reports to IT Infra/Operation Head Supervises SRE team C Job Dimensions Geographic Area Covered Pan India Stakeholders Internal All Departments External IT / DC Vendors D Key Result Areas Project / Delivery Management Assist SRE team on deployment and technical functions on DAST and IAST to perform any kind of vulnerability at runtime. Manage and always up and running the application environment by patching, upgrade, and monitoring. Support availability of applications and share RCA in case of downtime occurred. Support and introduce the solution for high availability, scalability and fault tolerance, to maintain the application highly available, scalable and recover from issues themselves in case of any crashes happens. Act as interface between Application & delivery team on application availability, reliability, scalability and security standards and meet the expectations without interfering the application functionality and performance. Solution architecture Contributes to the development of solution architectures in specific business, infrastructure, or functional areas. Identifies and evaluates alternative architectures and the trade-offs in cost, performance, and scalability. Produces specifications of cloud-based or on-premises components, tiers and interfaces, for translation into detailed designs using selected services and products. Supports a change program or project through the preparation of technical plans and application of design principles that comply with enterprise and solution architecture standards (including security). Systems design Designs and implements systems to ensure the reliability, availability, and scalability of services Identifies and evaluates alternative design options and trade-offs. Creates multiple design views to address the concerns of the different stakeholders of the architecture and to handle both functional and non-functional requirements. Models, simulates, or prototypes the behavior of proposed systems components to enable approval by stakeholders. Produces detailed design specification to form the basis for construction of systems. Reviews, verifies, and improves own designs against specifications. Software design Designs software components and modules using appropriate modelling techniques following agreed software design standards, patterns, and methodology. Creates and communicates multiple design views to identify and balance the concerns of all stakeholders of the software design and to allow for both functional and non-functional requirements. Identifies and evaluates alternative design options and trade-offs. Recommends designs which consider target environment, performance security requirements and existing systems. Systems integration and build Provides technical expertise to enable the configuration of software, other system components and equipment for systems availability, reliability, and scalability. Collaborates with technical teams to develop and agree system availability plans and report on any downtime if any. Defines complex/new integration builds. Ensures that all the environments are correctly configured and available post integration of new build. Designs, performs, and reports results of tests of the integration build are good. Identifies and documents system integration components for recording in the configuration management system. Recommends and implements improvements to processes and tools that helps in integrating builds without any downtime. Risk Management and Mitigation Analyse Process and systems to mitigate risk and provide solutions for non-stated functional requirements Ensure all IT design and architecture risks are managed Managing the Team SRE Lead is responsible for leading and managing a team of SREs. This includes hiring, training, and developing team members Lead a team of SREs in an organization and ensure the reliability, availability, and scalability of the organization-s applications and services Guide extended team in sequencing the project deliverables to ensure all the building blocks fit in properly with no rework and surprises. Review provided technical, integration, database design artifacts and source code if required Designing and implementing monitoring systems and creating dashboards to track key performance indicators Misc Ensure compliance to AIA and TALIC compliance and standards Constantly align with the IT team, Vendor team and business stakeholder E Skills Required Technical Expertise in setting up Prometheus and Grafana dashboards. Experience of working on complex setup for setting up observability stack. Experience around setting up tools such as Dynatrace, app dynamics, solarwinds, HP open view, Manage engine, Symphony summit etc can be real advantage. Integration of monitoring tool with jjra, past experience around working on jira can be good advantages. Azure cloud experience, working with containerized environment, Jenkins based CICD automation understanding is must for this role. Experience in managing large environments and maintain availability, scalability, and reliability of the multiple applications in various environments using Devops tools like Jenkins, Docker, Kubernetes, Helm, Hasicorp Terraform, Vault, ServiceMesh, Load Balancer, APIGateway, Prometheus, Grafana along with dynamic security vulnerability scanning tools like Snyk and StackHawk. Relevant experience working with On-prem and Azure public cloud for implementing SAST/DAST/IAST using Snyk, StackHawk. Relevant Experience in managing GitLab as Code Repository management, user management and overall administration of GitLab Server. Relevant Experience in Database Management and Query execution using Jenkins Flyway or Liquibase plugins. Fundamental experience with Azure cognitive services (ML, AI, etc), Azure DevOps and Azure Data & Databases Ability to effectively manage complex relationships A communication style which positively impacts Flexibility & resilience Ability and willingness to challenge constructively and effectively Good communication skills University bachelor\'s degree or preferably with an Engineering degree in Computers or IT. Technology Technical Skill: Behavioral Essential Desired Interpersonal skills Communication skills Creative thinking skills Supervising/Leadership skills Teamwork Skills Influencing skills Relationship Building skills Decision making skills F Incumbent Characteristics Essential Desired Qualification B.E Experience 10 - 12 years of experience in setting up observability tools like Prometheus and Grafana across the organization. Experience of working along Jenkins, Docker, Kubernetes, Helm, Hasicorp Terraform, Vault, ServiceMesh, Load Balancer, APIGateway, Prometheus, Grafana along with dynamic security vulnerability scanning tools like Snyk and StackHawk. Experience in managing applications in Azure Cloud and various technologies like Java / J2EE, NodeJS, Angular on Kubernetes. Good Knowledge of DevOps CI/CD/CT tools implementation for continuous delivery of the applications. Security integration with code while developing code using Infra as Code, Configuration as Code, Security as Code, Policy as Code. Good interpersonal and communication skills with ability to build productive relationships across the participants in the ecosystem

foundit

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD3197819
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Thane, Maharashtra, India
  • Education
    Not mentioned
  • Experience
    Year