Senior Engineer

Year    KA, IN, India

Job Description

We are seeking a reliable and forward-thinking Senior Cloud Operations Engineer to help us scale and stabilize the AWS cloud infrastructure that powers our growing suite of SaaS products. In this role, you will be responsible for owning, maintaining, and improving production and development environments in AWS--ensuring they are secure, performant, and highly available.


You will play a critical role in our operational strategy by responding to infrastructure incidents, implementing preventive improvements, and driving infrastructure as code (IaC) practices using Terraform, Ansible, Python, and Bash. You will also be accountable for ensuring production readiness of all systems you manage, including monitoring coverage, rollback capability, environment standardization, and support documentation.


As our environment grows in complexity, we rely on well-defined processes, IaC-driven automation, and clear production readiness standards to maintain stability and operational discipline. You will collaborate with development and platform teams to align infrastructure changes with service delivery needs and long-term architecture goals.


This role directly supports service uptime, deployment velocity, and customer satisfaction, and plays a vital part in the stability and scalability of our entire SaaS platform.


Required Qualifications


- 7+ years of experience in cloud operations, infrastructure engineering, or site reliability roles.


- Active AWS Certification required (e.g., Solutions Architect, DevOps Engineer, SysOps Administrator - Associate or Professional).


- Deep hands-on experience with core AWS services, including EC2, S3, IAM, RDS, VPC, Lambda, CloudFormation.


- Strong Linux administration skills, including system hardening, network configuration, and OS-level performance tuning.


- Proficient in Python and Bash scripting for automation and diagnostics.


- Significant experience implementing and supporting Infrastructure as Code (IaC), particularly with Terraform and Ansible.


- Demonstrated success operating and supporting production systems in high-uptime, high-scale environments.


- Clear understanding of what constitutes a production-ready service, and the ability to enforce those standards consistently.


Preferred Qualifications


- Experience operating containerized infrastructure (e.g., Docker, ECS, or EKS).


- Familiarity with AWS cost governance, tagging, and usage reporting practices.


- Exposure to CI/CD pipelines and infrastructure automation within Git-based workflows.


- A structured, systems-oriented thinker with a passion for operational excellence, process discipline, and scaling infrastructure through automation.




#LI-RT

Key Responsibilities

Key Responsibilities


- Operate and improve production and development infrastructure in AWS, with full ownership of high-impact services and components.


- Design, implement, and manage infrastructure using Infrastructure as Code (IaC) tools such as Terraform and Ansible, following automation best practices.


- Troubleshoot and resolve infrastructure-level issues impacting availability, performance, or configuration.


- Respond to infrastructure incidents, perform root cause analysis, and implement long-term preventive solutions.


- Design and maintain monitoring and alerting systems to ensure visibility into environment health and to reduce time to detection and resolution.


- Enforce production readiness standards, including rollback support, monitoring coverage, patching, configuration consistency, and documentation.


- Apply structured change control and release practices to minimize operational risk during deployments and upgrades.


- Collaborate with development teams to ensure infrastructure supports service delivery, scale, and architectural alignment.


- Execute infrastructure upgrades, environment builds, and configuration hardening based on evolving business and security needs.


- Participate in an on-call rotation, supporting 24/7 infrastructure availability and improving operational response over time.


- Maintain up-to-date documentation, runbooks, and post-incident reviews to promote repeatability and team-wide alignment.




#LI-RT

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD4242321
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    KA, IN, India
  • Education
    Not mentioned
  • Experience
    Year