to work with one of the leading financial services organizations in the US. This role involves managing the
end-to-end application and system stack
, ensuring high reliability, scalability, and performance of distributed systems. As an SRE, you will combine software engineering and systems engineering to build and operate large-scale, fault-tolerant production environments.
Key Responsibilities
Engage in and improve the software development lifecycle - from design and development to deployment, operations, and refinement.
Design, develop, and maintain
large-scale infrastructure, CI/CD automation pipelines, and build tools
.
Influence infrastructure architecture, standards, and methods for highly scalable systems.
Support services prior to production through
.
Maintain and monitor services in production by tracking key performance indicators (availability, latency, system health).
Automate scalability, resiliency, and system performance improvements.
Investigate and resolve
performance and reliability issues
across large-scale and high-throughput services.
Collaborate with architects and engineers to ensure applications are scalable, maintainable, and follow
DR/HA strategies
.
Create and maintain documentation, runbooks, and operational guides.
Implement corrective action plans with a focus on sustainable, preventative, and automated solutions.
Requirements
Bachelor's degree
in Computer Science, Engineering, or related field (or equivalent experience).
8+ years of experience
as a Site Reliability Engineer or in a similar role.
Strong hands-on expertise in
Google Cloud Platform (GCP)
; experience with AWS is a plus.
Proficiency in
DevOps practices, CI/CD pipelines, and build tools (e.g., Jenkins)
.
Solid understanding of
container orchestration (Docker, Kubernetes)
.
Familiarity with
configuration management and deployment tools
(Chef, Octopus, Puppet, Ansible, SaltStack, etc.).
Strong cross-functional knowledge of
systems, storage, networking, security, and databases
.
Experience operating production environments at scale with focus on
availability and latency
.
Excellent
communication, collaboration, and problem-solving skills
.
Strong
system administration
skills on Linux/Windows, with automation and orchestration experience.
Hands-on with
infrastructure as code (Terraform, CloudFormation)
.
Proficiency in
CI/CD tools and practices
.
Preferred / Nice-to-Have
Expertise in
designing, analyzing, and troubleshooting large-scale distributed systems
.
Passion for
automation and eliminating manual toil
.
Experience working in
highly secure, regulated, or compliant industries
.
Knowledge of
security and compliance best practices
.
Experience in
DevOps culture
, thriving in collaborative and fast-paced environments
Skills
Gcp,Aws,Jenkins,Kubernetes
About UST
UST is a global digital transformation solutions provider. For more than 20 years, UST has worked side by side with the world's best companies to make a real impact through transformation. Powered by technology, inspired by people and led by purpose, UST partners with their clients from design to operation. With deep domain expertise and a future-proof philosophy, UST embeds innovation and agility into their clients' organizations. With over 30,000 employees in 30 countries, UST builds for boundless impact--touching billions of lives in the process.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.