Site Reliability Engineer
Location: Whitefield, Bengaluru
Company Overview
Founded in 2000, Optym is building SaaS solutions for the transportation and logistics industry and making it more efficient. Optym's software solutions are used by leading railroads, airlines, and trucking companies, and have created a cumulative business value of over $1 billion for its clients. With its headquarters based in Dallas, Texas, and centers of excellence located in Europe and India, Optym's team consists of 250+ professionals. Optym has about 50 highly specialized professionals in the US and is expecting major growth in the next five years.
Optym is looking for a brilliant, highly qualified, and well-educated Site Reliability Engineer to assist in managing Optimization and AI/ML solutions for the transportation and logistics industry.
Optym offers competitive wages, excellent benefits, a great working environment, and the culture of entrepreneurship and ownership. Optym offers a generous profit and equity sharing plan with the potential to increase your compensation substantially based on the success of Optym.
Responsibilities
Monitoring & Observability
Continuously monitor system health using tools like Azure Monitor and Datadog.
Define and track SLIs/SLOs; ensure dashboards reflect operational KPIs.
Document trends and propose improvements to performance and reliability.
Incident Response & Alerting
Manage alerts for accuracy and responsiveness.
Respond to incidents quickly and conduct root cause analysis (RCA).
Lead post-incident reviews and implement corrective actions.
Automation & Reliability
Automate manual ops tasks to improve system resilience.
Build self-healing, fault-tolerant infrastructure.
Contribute to system design for reliability and scalability.
CI/CD & Release Engineering
Build and maintain CI/CD pipelines with rollback and canary strategies.
Collaborate with dev teams for smooth and repeatable deployments.
Infrastructure as Code
Use Terraform/ARM templates to manage cloud infrastructure.
Apply version control and peer reviews for infra changes.
Cloud Operations (Azure)
Support Azure cloud infrastructure across environments.
Handle deployments, networking, and basic troubleshooting.
Cost & Capacity Management
Right-size infrastructure and optimize cloud spend.
Forecast usage and ensure performance under growth.
Collaboration & Knowledge Sharing
Maintain updated documentation and runbooks.
Mentor peers and support cross-team collaboration.
Participate in on-call rotations as needed.
Requirements
B. Tech/ B.E. in Computer Science, BCA, IT, or related field
+ 2-4 years of experience overall with minimum 1-year hands-on experience on Azure
+ Minimum two years' experience in Production support
+ Has the ability to work independently at a fast pace, as well as in a team environment on a variety of project settings
+ Continuously learn new skills where required.
+ Possesses effective communication skills as this role requires extensive communication across different domains
+ Able to have flexible working hours and work with globally distributed teams
+ On-Call Support: Address critical incidents and maintenance activity outside of regular business hours
Mandatory Skills
Hands-on experience Azure cloud infrastructure administration & support
+ Kubernetes, Docker, PowerShell, Python, Azure ARM templates
+ Proactive Infrastructure Monitoring via tools like Azure Monitor, Datadog, etc.
+ Proactive Application monitoring via Application insights, Datadog etc.
+ Triaging issue with Azure Support and Application Team
+ Should be able to work with Linux/Unix environment
Preferred Certifications
Microsoft Azure Administrator - Exam AZ-103/104
+ Certified Kubernetes Administrator
Desirable Skills
Server Environment: Windows Server 2012 and above, CentOS 7 and above
+ Virtualization environment: Basic knowledge of VMware vSphere and vCenter
+ Experience in managing full three-tier application stacks from the OS up through custom applications
+ Database Administration and troubleshooting databases like MS SQL, PostgreSQL
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.