Site Reliability Engineer Ii

Year Chennai, Tamil Nadu, India

Apply Now

Job Description

OUR VISION: THE WORLD. SUBSCRIBED. Customers have changed. They're looking for new ways to engage with businesses. Consumers today have a new set of expectations. They want outcomes, not ownership. Customization, not generalization. Constant improvement, not planned obsolescence. In the old world (let's call it the Product Economy) it was all about things. Acquiring new customers, shipping commodities, billing for one-time transactions. But in today's new era, it's all about relationships. More and more customers are becoming subscribers because subscription experiences built around services meet consumers' needs better than the static offerings or a single product. Our vision is "The World Subscribed" where one day every company will be a part of the Subscription Economy® (a phrase coined by our CEO, Tien Tzuo and author of the best selling book Subscribed). THE TEAM Site Reliability Engineers at Zuora play a critical and visible role in delivering and supporting our platform. We are responsible for scaling and optimizing the reliability, availability, and performance of our infrastructure and platform services, and partnering with Engineering teams to build highly available and performant services. We work with amazing developer teams in the design, provisioning, integration, configuration, monitoring, and incident response of large scale distributed applications and platform services. We deliver kickass SaaS. WHAT YOU'LL ACHIEVE As a Senior SRE, you will be a member of a team that understands the configuration, technical dependencies, and overall behavioral characteristics of production services. In partnership with developers, you have the responsibility to ensure services are designed and delivered with focus on security, resiliency, scale, and performance. SREs are the ultimate authority and are accountable for end-to-end performance and operability of the services they own. Champion service reliability and prevention

You will be part of the team whose mission is the shared ownership of a collection of services and technology areas, in partnership with developer teams.
Service restoration: You are a key escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs) for L1 staff. You will often be called in during major incidents as a Subject Matter Expert (SME), when the source of a problem is unclear. You will have the deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. You will help maintain up-to-date documentation on deployments, processes and SOP runbooks.
Prevention: Once you have expertly resolved an issue, you will immediately work on how to more quickly resolve the issue next time, with the goal to prevent the problem from recurring. You will drive the discovery and implementation of automated and self-healing solutions.

Service design and implementation

You will partner with development SCRUM teams in defining and implementing improvements to service architecture, both current and future. You will be an expert at articulating technical characteristics of services and their dependencies, and guide development teams to engineer highly reliable and performant services.
You will frequently partner with developer SCRUM teams and actively participate in the execution of tasks required to meet milestones and deliverables set by the team throughout a release cycle.

Operations Engineering

You will own reliability and performance of one or more services. You will understand and be able to communicate the capacity, scale, security, performance attributes and requirements of services you own. You are a SME, able to understand and communicate the characteristics of your service stack, such as:
- Degradation and behavior under load of the services and their dependencies
- End-to-end tuning needs, optimizing resource utilization, as load patterns fluctuate
- Instrumentation and metrics that clearly describe the service behaviors
- Scaling requirements and patterns
- Resiliency and recoverability, ensuring that backup / restore and disaster recovery capabilities are implemented, tested and maintained

You will take part in a shared on-call rotation that won't cripple your life or kill your soul. WHAT YOU'LL NEED TO BE SUCCESSFUL SREs are a rare mix of sysadmins and development engineers, and as such you have the ability to understand and explain the effect of product architecture decisions on the ability to run as distributed systems. You are driven by professional curiosity and a desire to develop a deep understanding of the services and the technologies they depend upon. You are proactive, self-motivated, customer-focused, organized, and a good communicator. You demonstrate competence in shell scripting and high-level programming languages such as Bash and Python. We use Python extensively. You have over 4 years experience running large scale customer facing web services with a solid understanding of:

REST APIs
Linux/Unix system internals.
Load balancing technologies, including L7 routing, DNS, and CDN
Networking and TCP/IP
Server hardware configuration
Monitoring and instrumentation, including critical instrumentation and alerting
Standard Internet services, such as DNS, HTTP, etc.
Cloud computing patterns
Configuration management using Puppet, Chef, Ansible, or similar
IT Security and compliance
Container and Container management.

You demonstrate practical knowledge of various aspects of distributed service design, including messaging protocols, caching strategies, persistence technologies, and queuing. You have experience with AWS Services like EC2, ELB, ElastiCache, DynamoDB, SQS, SNS, RDS, S3. You are passionate about automation. Your head is full of customer-delighting ideas for the next hackathon. An ideal candidate will also have experience with:

Container and Container Management technologies, such as Docker and Kubernetes
Databases and big data stores
Defining and documenting technical architecture of complex and highly scalable products
Familiarity with ITIL-based incident, problem, and change management
Experience working with large global teams and ability to coordinate well within and across various development teams.

ABOUT ZUORA & OUR "ZEO" CULTURE Zuora (NYSE: ZUO) Zuora provides the leading cloud-based subscription management platform that functions as a system of record for subscription businesses across all industries. Powering the Subscription Economy®, the Zuora platform was architected specifically for dynamic, recurring subscription business models and acts as an intelligent subscription management hub that automates and orchestrates the entire subscription order-to-revenue process seamlessly across billing and revenue recognition. Zuora serves more than 1,000 companies around the world, including Box, Ford, Penske Media Corporation, Schneider Electric, Siemens, Xplornet, and Zoom. At Zuora, we have one CEO but every employee is empowered and supported to be the 'ZEO' of their own career experience. By embedding inclusion and belonging into our processes, policies and culture, we are building a workplace where our 1,200+ ZEOs across North America, Europe, and APAC can bring all the elements of who they are into their work. In addition to an industry-leading six-month, 100% paid parental leave for all our ZEOs, we also offer programs to support your mental health and give back to our communities along with "career cash" and plenty of learning and development opportunities. To learn more visit www.zuora.com Zuora is proud to be an Equal Employment Opportunity employer Think, be and do you! At Zuora, different perspectives, experiences and contributions matter. Everyone counts. Zuora is proud to be an Equal Opportunity Employer committed to creating an inclusive environment for all. Zuora does not discriminate on the basis of, and considers individuals seeking employment with Zuora without regards to, race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics. We encourage candidates from all backgrounds to apply. Applicants in need of special assistance or accommodation during the interview process or in accessing our website may contact us by sending an email to assistance(at)zuora.com.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Related Jobs

Site Reliability Engineer II

Electronic Arts

Hyderabad, Telangana

Apply Now
Site Reliability Engineer

LEXISNEXIS

Mumbai, Maharashtra

Apply Now

Site Reliability Engineer

RELX

Mumbai, Maharashtra

Apply Now
Operations Site Reliability Engineer II 2

Forcepoint

Bengaluru, Karnataka

Apply Now

Job Detail

Job Id

JD2870532
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Permanent
Job Location

Chennai, Tamil Nadu, India
Education

Not mentioned
Experience

Year

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers

Site Reliability Engineer Ii

Job Description

Related Jobs

Site Reliability Engineer II

Site Reliability Engineer