High Availability And Scalability Engineering Lead

Year    TN, IN, India

Job Description

Here's a

comprehensive job description

for a

High Availability and Scalability Engineering Lead

-- suitable for enterprise, SaaS, or mission-critical infrastructure teams:

Job Title:



High Availability & Scalability Engineering Lead



Role Overview:



The High Availability & Scalability Engineering Lead is responsible for designing, implementing, and managing highly available, fault-tolerant, and scalable systems to support critical business applications. This role blends deep technical expertise in distributed systems, cloud infrastructure, and performance optimization with leadership and cross-functional collaboration.

You will lead a team of engineers to ensure that all platforms meet stringent SLAs for uptime, resilience, and scalability--especially under peak loads or failure scenarios.

Key Responsibilities:Architecture & Design



Design and implement

high-availability architectures

using clustering, load balancing, replication, and failover strategies. Lead design reviews for

scalable distributed systems

(microservices, event-driven, or service mesh architectures). Evaluate and adopt cloud-native technologies (e.g.,

Kubernetes, ECS, autoscaling groups, service meshes, serverless

) to enhance elasticity and resilience. Drive the definition of

RTO/RPO

, failover automation, and multi-region deployment strategies.

Implementation & Operations



Develop and enforce

SLAs, SLOs, and SLIs

for reliability, latency, and performance. Lead efforts in

capacity planning, performance tuning, and chaos testing

to ensure predictable system behavior under stress. Collaborate with DevOps and SRE teams to automate infrastructure provisioning (e.g., Terraform, Pulumi, CloudFormation). Establish monitoring, alerting, and self-healing mechanisms using tools such as

Prometheus, Grafana, Datadog, or New Relic

.

Leadership & Strategy



Mentor and guide engineers on designing resilient, performant, and secure architectures. Partner with product and platform engineering to forecast future growth and capacity needs. Create frameworks and best practices for high availability, DR, and horizontal scalability across teams. Lead incident reviews, root cause analysis, and reliability retrospectives to drive continuous improvement.

Required Skills & Qualifications:



Bachelor's or Master's in Computer Science, Engineering, or related field.

8+ years

of experience in backend, infrastructure, or systems engineering;

3+ years

in a leadership or architect role. Deep expertise with

cloud platforms (Azure)

and

container orchestration (Kubernetes, Docker, ECS)

. Proficiency in

distributed systems design

,

load balancing

,

replication

,

failover

, and

data partitioning

. Strong programming experience in one or more:

Go, Python, Java, or C++

. Experience with

observability and reliability engineering

(monitoring, logging, tracing, SLOs). Proven ability to

lead cross-functional initiatives

, drive architectural decisions, and scale systems supporting millions of users or high transaction volumes.

Preferred Qualifications:



Hands-on experience with

multi-region, multi-cloud architectures

. Certification in

Microsoft Azure

Background in

SRE principles

,

Chaos Engineering

, or

Resilience Engineering

. Knowledge of

event-streaming technologies (Kafka, Pulsar, RabbitMQ)

and

distributed databases (Cassandra, CockroachDB, DynamoDB)

.

Success Indicators:



Achieved uptime and latency SLAs consistently across services. Reduction in mean time to recovery (MTTR) and incident frequency. Documented and automated failover and scaling strategies. Demonstrated mentorship and technical leadership within engineering teams.
Job Type: Full-time

Pay: ₹670,805.33 - ₹2,059,333.85 per year

Benefits:

Cell phone reimbursement Health insurance Internet reimbursement Paid sick time * Paid time off

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD4525968
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    TN, IN, India
  • Education
    Not mentioned
  • Experience
    Year