Senior Devops Engineer Ml Engineering Support

Year    KA, IN, India

Job Description

Teamwork makes the stream work.


-----------------------------------




Roku is changing how the world watches TV





Roku is the #1 TV streaming platform in the U.S., Canada, and Mexico, and we've set our sights on powering every television in the world. Roku pioneered streaming to the TV. Our mission is to be the TV streaming platform that connects the entire TV ecosystem. We connect consumers to the content they love, enable content publishers to build and monetize large audiences, and provide advertisers unique capabilities to engage consumers.



From your first day at Roku, you'll make a valuable - and valued - contribution. We're a fast-growing public company where no one is a bystander. We offer you the opportunity to delight millions of TV streamers around the world while gaining meaningful experience across a variety of disciplines.




About the Role





We are seeking a talented and experienced

Senior Software Engineer, DevOps/SRE

to join our dynamic team and play a critical role in supporting

Machine Learning Engineering activities

. The ideal candidate will have a strong background in

DevOps practices, cloud infrastructure management, automation, and MLOps tooling

, along with team leadership skills.



If you have a proven track record architecting and scaling ML/AI platforms, enjoy solving intriguing system challenges at internet-scale, are innovative at heart, and thrive in building infrastructure that accelerates ML experimentation and deployment -- this role might be a great fit for you!



What You'll Be Doing




Provide

technical leadership and guidance

to DevOps/SRE engineers supporting ML Engineering initiatives; mentor team members in best practices, technologies, and methodologies. Design, implement, and maintain

scalable and resilient cloud infrastructure

(AWS & GCP) optimized for ML workloads, including

GPU/TPU orchestration

and distributed training. Partner with ML Engineers to

streamline the end-to-end ML lifecycle

: data ingestion, feature engineering, training, evaluation, deployment, and monitoring. Build and maintain

CI/CD pipelines for ML applications and models

using GitHub Actions, GitLab CI/CD, Argo, or Tekton. Integrate with

MLOps platforms

(e.g., MLflow, Kubeflow, Airflow, SageMaker, Vertex AI) to ensure reproducibility and traceability of experiments. Lead incident response efforts for ML-serving and training infrastructure, minimizing downtime and ensuring high availability. Implement

observability practices

for ML workloads, including model performance monitoring, drift detection, and metrics via Prometheus, Grafana, and Datadog. Collaborate with security and compliance teams to ensure adherence to

data governance, PCI, SOX, and AI/ML data security standards

. Optimize system resources for large-scale ML jobs, including

auto-scaling GPU clusters, cost optimization, and quota management

. Drive

continuous improvement

across DevOps + MLOps processes; proactively identify areas for enhancement. Maintain clear documentation and foster a culture of

knowledge sharing

across DevOps, ML, and Data Engineering teams. Participate in

24x7 on-call rotation

, with availability to work with global teams in the event of critical outages.

We're Excited if You Have




8+ years of experience

in DevOps/SRE roles, including at least

2-3 years supporting ML or data-intensive workloads

. Strong programming skills in

Python or Go

; experience building internal tools and automation for ML pipelines. Hands-on experience with

Kubernetes, Docker, ECS/EKS/GKE

, and service mesh tools such as

Istio or Envoy

. Familiarity with

GPU/accelerator orchestration

(NVIDIA GPU Operator, KubeFlow, Slurm, Ray, or similar). Experience with

Infrastructure as Code (IaC)

: Terraform, Helm, Ansible, or CloudFormation. Deep understanding of

distributed systems, microservices architecture, and cloud-native design patterns

. Exposure to

MLOps tools

: MLflow, Kubeflow Pipelines, Airflow, Argo, Vertex AI, or SageMaker. Strong proficiency in

cloud platforms

(AWS and GCP required; Azure a plus). Knowledge of

data engineering concepts

(object storage like S3/GCS, parquet/ORC, data versioning with DVC or Delta Lake). Experience with

networking, security, and compliance

(role-based access, VPC design, encryption, auditing). Demonstrated success in

cross-functional collaboration

with ML, Data, and Product teams.

Preferred certifications

: Certified Kubernetes Administrator (CKA), AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer, NVIDIA Deep Learning Institute courses. AI literacy and curiosity, You have either tried Gen AI in your previous work or outside of work or are curious about Gen AI and have explored it. BS Degree in Computer Science or equivalent experience.

Benefits





Roku is committed to offering a diverse range of benefits as part of our compensation package to support our employees and their families. Our comprehensive benefits include global access to mental health and financial wellness support and resources. Local benefits include statutory and voluntary benefits which may include healthcare (medical, dental, and vision), life, accident, disability, commuter, and retirement options (401(k)/pension). Our employees can take time off work for vacation and other personal reasons to balance their evolving work and life needs. It's important to note that not every benefit is available in all locations or for every role. For details specific to your location, please consult with your recruiter.




The Roku Culture





Roku is a great place for people who want to work in a fast-paced environment where everyone is focused on the company's success rather than their own. We try to surround ourselves with people who are great at their jobs, who are easy to work with, and who keep their egos in check. We appreciate a sense of humor. We believe a fewer number of very talented folks can do more for less cost than a larger number of less talented teams. We're independent thinkers with big ideas who act boldly, move fast and accomplish extraordinary things through collaboration and trust. In short, at Roku you'll be part of a company that's changing how the world watches TV.



We have a unique culture that we are proud of. We think of ourselves primarily as problem-solvers, which itself is a two-part idea. We come up with the solution, but the solution isn't real until it is built and delivered to the customer. That penchant for action gives us a pragmatic approach to innovation, one that has served us well since 2002.



To learn more about Roku, our global footprint, and how we've grown, visit https://www.weareroku.com/factsheet.



By providing your information, you acknowledge that you want Roku to contact you about job roles, that you have read Roku's Applicant Privacy Notice, and understand that Roku will use your information as described in that notice. If you do not wish to receive any communications from Roku regarding this role or similar roles in the future, you may unsubscribe here at any time.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD4355049
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    KA, IN, India
  • Education
    Not mentioned
  • Experience
    Year