Software Development Lead

Year    TS, IN, India

Job Description

Project Role :

Software Development Lead

Project Role Description :

Develop and configure software systems either end-to-end or for a specific stage of product lifecycle. Apply knowledge of technologies, applications, methodologies, processes and tools to support a client, project or entity.


Must have skills :

Site Reliability Engineering

Good to have skills :

DevOps, AWS Administration

Minimum

15

year(s) of experience is required

Educational Qualification :

BTECH



Summary As the manager of the Site Reliability Engineering (SRE) team, you will lead a high-impact team focused on building and scaling automation, observability, and incident response to improve the reliability, stability, and performance of our cloud-based services. You will play a critical role in shaping the reliability strategy for our SaaS platforms, driving innovation in incident response, and ensuring our systems are resilient, performant, and aligned with customer expectations. This role requires a strong technical foundation, a passion for operational excellence, and the ability to lead cross-functional collaboration across engineering and operations teams. Roles and responsibilities: 12+ years of relevant experience in SRE, DevOps, or infrastructure engineering with 4+ years of experience in a technical leadership or management role. People management experience within software development, managing a high-performing team with strong SRE capabilities around observability, reliability, and incident response Ability to roll out enterprise level programs (e.g. SLOs, incident response, observability standards) across a variety of product and engineering teams. Recruiting, interviewing, and hiring top engineering talent to fill out team needs that are aligned with a broader talent strategy Development of engineer's career planning and skills growth. Identify areas for engineers to build more knowledge and create opportunities for them to exercise these new engineering and soft skills in practice Work closely with leadership located in other geographies on joint efforts to drive Site Reliability Engineering journey Deep understanding of SRE principles, including SLOs, and SLIs Strong knowledge of cloud platforms (AWS, Azure, OCI) and infrastructure-as-code tools (e.g., Terraform) Expertise implementing and running observability and monitoring tools (e.g., Datadog, Dynatrace, ELK) Lead incident response processes including coordination, root cause analysis (RCA), and long-term mitigation. Experience managing teams in a 24/7 production environment. Proficiency in software development automation (e.g., Python, Go, Shell, etc.) Excellent communication and collaboration skills across technical and non-technical stakeholders Foster a culture of continuous improvement, blameless postmortems, and proactive monitoring Balance innovation with operational excellence Drive alignment across engineering, product, and operations on service health and customer impact Practitioner of agile practices and play lead roles such as Scrum Master or Product Owner. Agile role certifications a plus, including value stream mapping practices to identify and eliminate waste in software delivery processes Display empathy towards engineers and their friction, work with them to develop common solution. Technical experience & Professional attributes: Lead and mentor a team of Site Reliability Engineers aligned to value streams and agile teams. Define and implement SRE best practices, including incident management, blameless postmortems, and error budgeting. Drive the adoption of observability standards across the enterprise using tools like Datadog and CloudWatch. Collaborate with engineering teams to design scalable, fault-tolerant systems with insightful observability Partner with Customer Success and Product teams to map and monitor key user journeys. Manage on-call rotations and ensure effective incident response and root cause analysis. Contribute to the evolution of our Cloud Platform by standardizing monitoring, alerting, and deployment practices. Support training and enablement efforts through internal platforms. Education qualifications: Bachelor's degree in computer science, Information Systems, or related field; or equivalent combination of education/experience. Master's degree is a plus. Additional Information: Be part of the larger Site Reliability and Cloud Engineering organization Be an influential people leader of a new site. This includes working with site leader and senior leadership in coordinating site-level activities and other functions as the site grows. Manage a team of varying seniority and skills around Site Reliability Engineering practices You will be working with a Trusted Tax Technology Leader, committed to delivering reliable and innovative solutions




BTECH

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD4486455
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    TS, IN, India
  • Education
    Not mentioned
  • Experience
    Year