Develop custom software solutions to design, code, and enhance components across systems or applications. Use modern frameworks and agile practices to deliver scalable, high-performing solutions tailored to specific business needs.
Must have skills :
AWS Architecture
Good to have skills :
Python (Programming Language), DevOps Architecture, Ansible on Microsoft Azure
Minimum
7.5
year(s) of experience is required
Educational Qualification :
15 years full time education
Scope of the role o AWS operations: EC2, EKS, RDS, ALB/CloudFront, IAM/OIDC, VPC/TGW/SGs, patching, and hygiene. o Application support: release readiness, runbooks, post-deploy smoke checks, performance baselines, and clean rollback paths. o Visibility: dashboards, logs, metrics, traces, synthetics, error budgets, and alert health. o Backup & DR: policies, schedules, retention, cross-region copies, restore testing, and DR runbooks (RPO/RTO owned and measured). o Incident leadership: run Sev-1/2 bridges, keep comms clear, and land post-mortems with actions that actually close. o Cost hygiene: tagging, right-sizing, SP/RI coverage, lifecycle cleanups (EBS/EIP/AMIs). o Team enablement: guardrails, golden runbooks, and small automations that remove toil. Day-to-day (what this looks like) o Triage overnight alerts and hot issues, set priorities, and make sure owners are clear. o Keep dashboards honest; fix flapping or missing alerts before they wake people up. o Check backups and recent restore points; open tickets for any gaps and track to done. o Unblock releases; verify smoke checks; keep environments tidy and predictable. o Lead or delegate break/fix; no lingering "mystery" incidents. o Write down what we learned in the runbook so the next person can fix it faster. Weekly rhythm o Ops review: incidents, alerts, deploys, costs, capacity, and backup status in one short readout. o Observability tune-up: delete noise, add the missing signal, and test a synthetic from the edge. o Backup/DR: run a small restore test and record RPO/RTO evidence. o Patch and change review: what shipped, what rolled back, why. Monthly outcomes o Share availability/SLOs, MTTR, change failure rate, observability coverage, backup compliance, and costs in plain English. o Close the top recurring issues (noisy alerts, flaky deploys). o Refresh the most-used runbooks; validate DR for one critical workload (tabletop or live restore). Core responsibilities o Own production readiness and stability for assigned AWS accounts and apps. o Lead incidents and land post-mortems; make the fixes stick. o Keep monitoring/logging/tracing standards real; enforce SLOs and error budgets. o Own backup strategy end-to-end, including monthly restore tests and DR docs. o Keep access least-privileged and auditable; rotate secrets and certs on time. o Drive cost posture and mentor the team; make on-call humane. What "good" looks like o Visibility: one clear dashboard per service, clean alert routing, low false positives. o Backups: 100% jobs green (or retried), documented RPO/RTO, and monthly restore tests that pass. o Reliability: MTTR trending down; most issues solved by the first responder with a runbook. o Change: predictable releases with smoke and rollback; fewer failed changes month over month. o Cost: flat or down against growth; tagging at or above 95%. Experience we're looking for 8-10+ years in cloud/app operations with strong AWS hands-on. Comfortable leading incidents, shaping dashboards and alerts, and automating the boring bits (Terraform, Ansible, Python). Experience running backups/DR in AWS and proving it with real restore tests. Cloud network experience.
15 years full time education
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.