Site Reliability Engineer

Year    Hyderabad, Telangana, India

Job Description


We are looking for an experienced Site Reliability Engineer (SRE) Lead to join our growing team. The ideal candidate will have a proven track record of leading a team of SREs and driving the reliability and performance of mission-critical applications and infrastructure

Roles & Responsibilities:

  • Design, implementation, and maintenance of highly scalable, reliable, and performant systems and services
  • Develop and maintain a deep understanding of our production systems and infrastructure to identify and resolve potential issues proactively.
  • Build and maintain monitoring and alerting systems to ensure timely and effective detection and resolution of incidents.
  • Collaborate with cross-functional teams to develop and implement best practices for application and infrastructure performance and reliability.
  • Drive continuous improvement of our systems and processes to increase efficiency, reduce downtime, and improve system availability.
  • Build and maintain strong relationships with internal stakeholders to understand their needs and requirements and ensure that our systems meet or exceed their expectations.
Qualifications:
  • Bachelor\'s degree in Computer Science or a related field, or equivalent experience.
  • 4+ years of experience in site reliability engineering or a related field
  • Have hands on experience on managing cloud infrastructure at scale (AWS (preferable)& Azure)
  • Must have experience in building infrastructure automation and observability stack (terraform, ansible, Prometheus, Grafana) (2+ years)
  • Must have experience in building CI/CD pipelines for containerised application (Jenkins, azure pipeline etc)
  • Must have experience with docker, Kubernetes and other Cloud Native technologies.
  • Strong System debugging skills.
  • Strong experience with cloud-based infrastructure, preferably AWS or GCP
  • Expertise in one or more scripting or programming languages (e.g., Python, Ruby, Bash, etc.)
  • Deep understanding of monitoring and alerting systems (e.g., Prometheus, Grafana, Alert manager, etc.)
  • Experience with containerization technologies (e.g., Docker, Kubernetes, etc.)
  • Excellent problem-solving and troubleshooting skills.
  • Strong communication and collaboration skills
  • Ability to thrive in a fast-paced, dynamic environment.

KC Overseas Education

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Related Jobs

Job Detail

  • Job Id
    JD3078495
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Hyderabad, Telangana, India
  • Education
    Not mentioned
  • Experience
    Year