Choosing Capgemini joining a team where you'll be empowered to build cutting-edge AI infrastructure, supported by a collaborative global community, and inspired to reimagine what's possible. Join us in enabling scalable, fault-tolerant AI systems that power next-generation machine learning workloads.
Your Role
As an
AI Runtime Engineer
, you will design and optimize distributed AI runtimes that enable high-performance, multi-node, multi-GPU training at scale. You'll work closely with AI infrastructure teams to build elastic, fault-tolerant systems and ensure seamless orchestration for advanced AI workloads.
In this role, you will:
Architect and implement
distributed AI runtime systems
with elastic scaling and job recovery.
Optimize
performance at low levels
(CUDA, NCCL, PyTorch internals) for multi-GPU workloads.
Develop
custom runtime architectures
for large-scale AI training pipelines.
Integrate orchestration tools like
Kubernetes, Ray, TorchElastic, Horovod
for containerized AI workloads.
Implement
fault recovery mechanisms
and observability hooks for runtime health monitoring.
Collaborate with AI researchers and platform engineers to ensure efficient resource utilization and throughput optimization.
Contribute to
CI/CD pipelines
for AI infrastructure and runtime deployments.
Your Profile
Mandatory Skills:
+ Hands-on experience in
distributed training systems
, multi-node/multi-GPU orchestration.
+ Expertise in
PyTorch internals
, CUDA, NCCL, and performance profiling.
+ Strong knowledge of
Kubernetes
, containerization, and orchestration frameworks.
Preferred Skills:
+ Experience with
TorchElastic
, Ray, Horovod.
+ Open-source contributions to PyTorch or runtime libraries.
+ Background in
HPC, compilers, or systems research
.
Education:
+ Bachelor's/Master's in Computer Science, Engineering, or related field.
About Us
At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world's most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as they provide unique R&D and engineering services across all industries. Join us for a career full of opportunities. Where you can make a difference. Where no two days are the same.
Ref. code
352835-en_GB
Posted on
08 Dec 2025
Experience level
Experienced Professionals
Contract type
Permanent
Location
Bangalore
Business unit
Engineering and RandD Services
Brand
Capgemini Engineering
Professional communities
Data & AI
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.