Ensure the scalability, availability, and resilience of mission-critical systems in
Azure
and Kubernetes environments.
Build SLOs/SLIs and establish reliability best practices following SRE principles.
Perform root-cause analysis, incident response, post-mortems, and continuous improvement activities.
Automate operational tasks and reduce toil through scripting and infrastructure automation.
2. Kubernetes Platform Management
Deploy, configure, and manage workloads on
Azure Kubernetes Service (AKS)
or self-managed Kubernetes clusters.
Manage cluster upgrades, node pools, RBAC, secrets, ingress controllers, autoscaling, and capacity planning.
Implement GitOps or automated deployment workflows for Kubernetes manifests or Helm charts.
Optimize cluster performance, networking, and security.
3. CI/CD Pipeline Development
Build and maintain
CI/CD pipelines
using tools such as Azure DevOps, GitHub Actions, or Jenkins.
Implement automated testing, build pipelines, artifact management, and secure deployment workflows.
Integrate CI/CD with Kubernetes, container registries, and infrastructure automation.
Enforce DevOps best practices, including versioning, release automation, and rollbacks.
4. Monitoring, Observability & Alerting
Implement and maintain observability stacks using
Prometheus
,
Grafana
, Alertmanager, Loki, or similar tools.
Create metrics dashboards, alerts, and performance monitoring for both applications and infrastructure.
Develop logging, tracing, and telemetry systems for full stack visibility.
Monitor capacity, resource utilization, cluster health, and system performance.
5. Azure Cloud Engineering
Design and maintain Azure cloud infrastructure: virtual networks, VM scale sets, load balancers, storage, and identity management.
Implement infrastructure-as-code solutions using
Terraform
, Bicep, or ARM templates.
Ensure compliance, governance, scaling, and cost optimization across cloud resources.
Integrate Azure services (Key Vault, Monitor, Log Analytics, Container Registry, Service Bus, etc.) into platform operations.
Required Skills & Experience
---------------------------------
3-8+ years of experience in SRE, DevOps, Cloud Engineering, or related roles.
Strong hands-on experience with
Kubernetes
(AKS preferred) and cloud-native architectures.
Proficiency with
Azure
cloud services and infrastructure.
Solid experience building and maintaining
CI/CD
pipelines.
Deep knowledge of
Grafana
and
Prometheus
for monitoring and observability.
Strong scripting/automation skills in Bash, Python, or PowerShell.
* Experience with containers (Docker), Git, and distributed systems.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.