Site Reliability Engineer

Year    Mumbai, Maharashtra, India

Job Description

Our Team:
Technology drives our business. Our team is made up of talented software engineers, infrastructure engineers, leaders and UX professionals. We care about technology as a craft and a differentiator. We bring our global products to market with a mix of software, cloud, data centers, infrastructure, design and grit.
Our Product Groups:
Individual Investor - building products like Morningstar.com and mobile apps for individuals like yourself
Institutional Investor - developing some of our flagship products like Morningstar Direct for institutional investors and our Advisor products for financial advisors
Workplace - this is where we build and provide our hosted digital advice platform for Retirement plans, 401K's, etc. (what some call robo-advisors)
Data - this is the heart of Morningstar where all data is sourced, collected, transformed, calculated and distributed across the world
What You'll Do:
As a Site Reliability (SRE)/DevOps Engineer on our data and analytics team, you will work on the availability, automation, performance, efficiency, scaling, monitoring and emergency response of the core systems that store data at Morningstar. You build deep understanding of platforms, architecture, people, systems, and processes to both establish and continuously improve SLIs and SLOs for uptime, performance, deployment, monitoring, and troubleshooting.
Your Day to Day:

  • Maintain and support the product and data systems: proactively monitor events, investigate issues, analyze solutions, and drive problems through to resolution.
  • Develop tools and reporting as needed by projects and operations.
  • Work with products to define application hardening and define opportunities for chaos engineering.
  • Use operational tools and monitoring platforms to gain in-depth knowledge, understanding, and ongoing monitoring of system availability, performance, and capacity.
  • Implement alerting strategy that makes alerts actionable and unique.
  • Provide follow-through to ensure issues are resolved to satisfaction
  • Contribute to continuous improvement and innovation within the team.
  • A sense of ownership, initiative and drive.
Basic Qualifications:
Bachelor's degree or higher with some experience in a technical support role.
You have been working in technology for 0 - 2 years
Responsibilities:
  • 1st level of support for data triage/issues
  • Support for other teamz - all data consumers
  • Review data logs, manifests, track lineage of data changes.
  • Identify causes of data changes, report out to owners of that change.
  • Understand event framework and triage events in audit DB
  • Access Management
  • Include entitlement access (EAMS)
  • Release management - deployment check lists
  • Support for data lake releases, dashboard changes, etc.
  • Coordinate with Data Lake DevOps in Mumbai around releases
  • Event and Incident management - Alerts and Incidents
  • RCA like contributions, why did this data move? Do we need to make changes to the pipes, qc checks, etc.
  • Incident commander for P1/P2 incidents
  • Drive continuous improvement by assessing trend of metrics such as MTTA, MTTR
  • Monitoring, data thresholds/coverage checks
  • Building and monitoring dashboards, alerts.
  • Contribute to the mechanical testing of changes (row count, nulls, break schema, etc)
  • Help with deep integration with QC framework
  • Ops readiness check lists
  • Ensure to follow standards, architecture diagrams, dataset catalog, data contract, logging standards etc.
  • Dashboards for data status, workflow status for all data movements in all zones
  • Etleap will be building some dashboards out; we'll need to understand and outline any overlap here
  • Tools for data view for troubleshooting purposes like XOI viewer
  • Same as above
  • Ad-hoc operations project coordination
  • Server maintenance, upgrade
  • Application and server log management
  • Disaster recovery plan and event
  • Security event and patching
Preferred Qualifications:
Experience in Python, other scripting languages
Experience with AWS: S3, SNS, SQS, DynamoDB, Glue, Lake Formation, Spark, SQL
Experience with Linux, Parquet, Avro and ORC formats
Knowledge of monitoring tools and strategy: VictorOps, New Relic, CloudWatch, Splunk ideally
Experience running incident post-mortems
Understanding of automated deployment processes leveraging Terraform, Jenkins
You have been working in technology for 0-2 years
Please include a cover letter describing your passion for engineering operations and participating in building efficient, reliable systems
Morningstar is an equal opportunity employer.
Morningstar's hybrid work environment gives you the opportunity to work remotely and collaborate in-person each week. We've found that we're at our best when we're purposely together on a regular basis, at least three days each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you'll have tools and resources to engage meaningfully with your global colleagues.
I10_MstarIndiaPvtLtd Morningstar India Private Ltd. (Delhi) Legal Entity

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD3776243
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Mumbai, Maharashtra, India
  • Education
    Not mentioned
  • Experience
    Year