A day in the life of an Infoscion
As a Senior Site Reliability Engineer you will play a critical role in supporting application developers by providing expert guidance on Application and infrastructure best
practices from reliability perspective
Improve reliability quality and time to market of our suite of products applications
Define suitable metrics for system with SLO SLI and setup observability mechanism to track it
Define error budget as per the SLO
Define strategy and setup up High Availability and Load Balancer based architecture
Drive a metrics driven culture and software delivery process using data to measure overall system quality and reliability
Balance feature development speed and reliability with well defined service level objectives
Provide primary operational support and engineering for products applications
Partner with solution architect and development teams to improve services reliability
Participate in system design
Participate in optimizing code automating operational tasks and toil reduction
Provide solutions for performance management monitoring and observability
Work with business users to understand issues develop root cause analysis and work with the development team for enhancements fixes
Working on distributed traces to visualize the entire workflow and analyze the cause of problems incidents
Improve security and performance of applications
Define evangelize and maintain SRE best practices
Solutionize and implement DevSecOps best practices
Improve automation including system s self healing capability
Manage and participate in on call incidents if required Priority Incident
If you think you fit right in to help our clients navigate their next in their digital transformation journey this is the place for you
Technical Requirements:
---------------------------
Must have at least 5 years of SRE experience in large programs with focus on release engineering observability tasks and reliability
Reliability practices
Chaos engineering
Strong experience on one or more Observability tools like New Relic AppDynamics Prometheus Dynatrace DataDog Splunk
Experience in event correlation using observability or other tools like BigPanda
Experience in Observability Dashboard creation custom metrics Synthetic Monitoring and Real User Monitoring RUM
Good experience in scripting or development languages including expertise in Python Ruby JSON Java and Node
JS PHP anyone
Experience with scripting in PowerShell M and Bash Shell Perl anyone
Strong knowledge of application design and architecture including microservices architecture
Experience in CICD tooling and best practices
Experience of Cloud platforms such as AWS Azure and Google
Additional Responsibilities:
--------------------------------
AIOps and related tools
Experience in container orchestration and practices including Kubernetes Docker Swarm
Experience in infrastructure automation tools like Terraform Cloud Formation Ansible and Puppet Any one
Knowledge on SQL NoSQL Oracle Couchbase
Experience working on ITSM tools like Remedy ServiceNow Confluence Jira
Experience with Cloud cost optimization FinOps