Responsibilities : A day in the life of an Infoscion
As a Senior Site Reliability Engineer, you will play a critical role in supporting application developers by providing expert guidance on Application and infrastructure best
practices from reliability perspective.
Improve reliability, quality, and time-to-market of our suite of products/applications.
Define suitable metrics for system with SLO/SLI and setup observability mechanism to track it
Define error budget as per the SLO
Define strategy and setup up High Availability and Load Balancer based architecture
Drive a metrics-driven culture and software delivery process using data to measure overall system quality and reliability.
Balance feature development speed and reliability with well-defined service level objectives
Provide primary operational support and engineering for products/applications
Partner with solution architect and development teams to improve services reliability
Participate in system design
Participate in optimizing code, automating operational tasks and toil reduction
Provide solutions for performance management, monitoring and observability
Work with business users to understand issues, develop root cause analysis and work with the development team for enhancements/fixes
Working on distributed traces to visualize the entire workflow and analyze the cause of problems/incidents
Improve security and performance of applications
Define, evangelize, and maintain SRE best practices
Solutionize and implement DevSecOps best practices
Improve automation including system's self-healing capability
Manage and participate in on-call incidents, if required (Priority Incident)
If you think you fit right in to help our clients navigate their next in their digital transformation journey, this is the place for you! Additional Responsibilities :
AIOps and related tools
Experience in container orchestration and practices, including Kubernetes, Docker Swarm
Experience in infrastructure automation tools like Terraform, Cloud Formation, Ansible, and Puppet (Any one)
Knowledge on SQL, NoSQL (Oracle, Couchbase)
Experience working on ITSM tools like Remedy, ServiceNow, Confluence, Jira
Experience with Cloud cost optimization / FinOps
Technical and Professional Requirements :
Must have at least 5+ years of SRE experience in large programs with focus on release engineering, observability tasks and reliability
Reliability practices
Chaos engineering
Strong experience on one or more Observability tools like New Relic, AppDynamics, Prometheus, Dynatrace, DataDog, Splunk,
Experience in event correlation using observability or other tools like BigPanda
Experience in Observability Dashboard creation, custom metrics, Synthetic Monitoring and Real User Monitoring (RUM)
Good experience in scripting or development languages, including expertise in Python, Ruby, JSON, Java, and Node.JS, PHP (anyone)
Experience with scripting in PowerShell(M) and Bash/Shell/Perl (anyone)
Strong knowledge of application design and architecture including microservices architecture
Experience in CICD tooling and best practices
Experience of Cloud platforms such as AWS, Azure, and Google
Preferred Skills : Foundational->Configuration Management->Configuration Management->Ansible,Technology->Infra_ToolAdministration-Others->Splunk Admin,Technology->Infra_ToolAdministration-PerformanceManagement->AppDynamics,Technology->Infra_ToolAdministration-PerformanceManagement->Dynatrace Generic Skills : Technology->Infra_ToolAdministration-ITSM->ServiceNow,Technology->OpenSystem->Python - OpenSystem->Python Educational Requirements : Bachelor of Engineering,BTech,Bachelor Of Science,Master Of Engineering,Master Of Technology Service Line : Infosys Cobalt Unit
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.