Key Responsibilities
- Own and manage AKS-based Kubernetes clusters (multi-tenant, namespace isolation).
- Implement and maintain GitOps workflows using FluxCD and Helm.
- Manage infrastructure as code with Terraform.
- Build and operate observability stack (Prometheus, Grafana, Loki, Tempo) and integrate with external tools (Datadog, Dynatrace, Grafana Cloud).
- Implement application observability and Real User Monitoring (RUM) to improve client experience.
- Automate CI/CD pipelines (GitHub Actions) and optimize build/deployment flows.
- Ensure secure platform operations (RBAC, secrets, TLS/mTLS, Azure Key Vault).
- Collaborate with SRE/Support engineers to troubleshoot production issues.
- Mentor other engineers on observability, incident response, and reliability practices.
Requirements
Required Skills
- Strong hands-on with Kubernetes (AKS preferred) and GitOps (FluxCD, ArgoCD).
- Infrastructure as Code: Terraform, Ansible.
- Experience implementing observability platforms (Dynatrace, Datadog, Grafana Cloud) from scratch.
- Application observability and Real User Monitoring (RUM).
- Observability: Prometheus, Grafana, Loki, Tempo, OpenTelemetry.
- CI/CD: GitHub Actions, Jenkins.
- Cloud: Azure (preferred), AWS, GCP basics.
- Linux, networking, containerization (Docker).
- Problem-solving in production environments.
Good to Have
- Experience with Temporal workflows and stateful workloads (PostgreSQL).
- Knowledge of cost optimization and performance tuning in cloud-native infra.
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.