We are seeking a highly skilled Principal Site Reliability Engineer to join our team. The ideal candidate will have a Bachelor's or Masters degree in computer science, Information Technology, or a related field (or equivalent experience) with 15+ years of experience in DevOps, Infrastructure, or Site Reliability Engineering roles. Additionally, the candidate should have 4+ years in a senior or principal-level capacity driving SRE or reliability automation initiatives and a proven track record designing and scaling large distributed, cloud-native platforms. Telecom domain experience is good to have.
Skills:
Deep expertise in AWS (EKS, EC2, RDS, IAM, VPC, Kafka, CloudWatch, API GW, Lambda, WAF, KMS) and container orchestration (EKS).
Deep expertise in HelmChart.
Hands-on experience with APM tools (Elastic APM preferred).
Expert in Terraform, Jenkins, Bitbucket, and Python/Bash/Go scripting for automation.
Strong understanding of SLO/SLI frameworks, error budgets, and observability design.
Familiarity with AIOps, chaos engineering, and event-driven automation.
Proven experience in performance optimization, capacity planning, and resilience testing.
Excellent documentation and system design communication skills.
Accreditation/certifications/licenses:
AWS Certified Solutions Architect - Professional or DevOps Engineer - Professional.
Certified Kubernetes Administrator (CKA) or Kubernetes Application Developer (CKAD).
* Preferred: SRE Foundation / Google SRE / Dynatrace Performance Professional / Elastic Certified Engineer.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.