Play a key role in ensuring system reliability at one of the world's most iconic and largest financial institutions.
As a Site Reliability Engineer II at JPMorgan Chase within the Corporate Technology, Finance Last Mile Reporting team, you will use technology to solve business problems and leverage software engineering best practices as we strive towards excellence. This role often works independently to execute small to medium projects, but you'll also have the opportunity to collaborate with cross functional teams to continually improve your level of knowledge about JPMorgan Chase's business and relevant technologies.
Job responsibilities
Design, implement & Maintain scalable highly available resilient systems for Banking domain
Establish and improve SRE best practices including monitoring Alerting, Incident response & Automation
Collaborate with Development and Business Operation team to enhance System Reliability , performance & Scalability
Implement DevOps methodology such as CI/CD pipeline, Infrastructure as code & Automated deployments
Monitor and improve system observability using tools such as Prometheus, Grafana, ELK stack, Dynatrace, Control+M, etc.
Analyze system failure and conduct Root cause analysis to prevent future incidence
Optimize system performance and ensure compliance with Banking security and regulatory standards.
Lead incident management and troubleshooting efforts ensuring minimal service disruption
Leverages technology to solve business problems by writing high quality, maintainable, and robust code following best practices in software engineering
Recognizes the toil within your role and proactively works towards eliminating it through either systems engineering or updating application code
Understands observability patterns and strives to implement and improve service level indicators, objectives monitoring, and alerting solutions for optimal transparency and analysis .Implement and refine error budgets and SLI/SLO/SLA to improve reliability
Required qualifications, capabilities, and skills
Formal training or certification on site reliability engineering concepts and 2+ years applied experience
Ability to code in at least one programming language such as Python, Java etc and understanding of SQL and databases such as Oracle, MYSQL etc
Experience maintaining and working on a Cloud-base infrastructure
Strong experience with site reliability concepts, principles, and practices
Exposure observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
Good knowledge on containers or a common Server OS such as Linux and Windows
Strong knowledge of continuous integration and continuous delivery tools like Jenkins, GitLab, Terraform, Ansible and common networking technologies
Ability to work in a large, collaborative team and demonstrates the willingness to vocalize ideas with peers and managers
Experience Microservice Architecture & Container Orchestration like Kubernetes, Docker-Swarm, etc
Experience in Databricks data engineering, data warehousing concepts, ETL processes (Job Runs, Data Ingestion and Delta Live Tables, Spark Streaming).
Ability to demonstrate and apply existing and new system processes, methodologies, and skills to contribute to the development of systems
Preferred qualifications, capabilities, and skills
General knowledge of financial services industry
Strong Problem solving & Analytical skills
Ability to work in high pressure environments & Manage incidents effectively
Passion for continuous improvement & Automation
ABOUT US
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.