AI Runtime Lead (LLM DevOps, PyTorch) Job in SAMMAAN Capital Finance

Ai Runtime Lead (llm Devops, Pytorch)

Year KA, IN, India

SAMMAAN Capital Finance

6 Current Jobs Openings

Apply Now

Job Description

We're looking for a

hands-on AI Runtime / Systems Engineer

who owns the

infrastructure layer behind large-scale LLM training and inference

.

If your strength lies in

systems, performance tuning, reliability, and distributed runtimes

, this role is for you.
If you primarily work on

model experimentation or notebooks

, this role is

not a fit

.

What You'll Own

AI Runtime Architecture

Design and own runtime infrastructure for

distributed training & inference

Build

elastic, fault-tolerant

systems (scaling, retries, recovery) Strengthen orchestration of

PyTorch-based distributed workloads

Performance & Systems Engineering

Profile and optimize

latency, throughput, GPU utilization

Tune

multi-GPU / multi-node

training & inference pipelines Debug low-level issues across

runtime, memory, and networking

Platform & Tooling

Build internal frameworks for training, checkpointing, recovery, and deployment Implement

observability, diagnostics, and resilience

Drive

CI/CD and production-readiness

standards for AI runtime systems

Technical Leadership

Own technical direction and delivery Mentor engineers through code reviews and architecture discussions Collaborate with infra, research, and product teams
Mandatory Requirements (Non-Negotiable)

4+ years

strong software / systems engineering experience

1+ year owning AI runtime infrastructure

(distributed training or inference)

Hands-on PyTorch runtime optimization

(mandatory) Proven

low-level performance engineering

experience Strong

Python and C++

skills (Java acceptable) Prior experience

leading or mentoring engineers

Good to Have

Kubernetes,

Ray

TorchElastic

, or custom orchestration frameworks LLM training pipelines, fine-tuning, checkpointing, elastic training Multi-GPU, multi-node cloud-native workloads Job scheduling, failure recovery, production-grade runtime systems
Job Type: Full-time

Pay: ₹1,200,000.00 - ₹1,600,000.00 per year

Work Location: In person

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.

Job Detail

Job Id

JD5096889
Industry

Not mentioned
Total Positions

1
Job Type:

Full Time
Salary:

Not mentioned
Employment Status

Permanent
Job Location

KA, IN, India
Education

Not mentioned
Experience

Year

MNC Jobs India

Jobs by Function

Popular Job Skills

Popular Industries

Popular Cities

Jobseekers

Employers