We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.
AMD together we advance_
THE TEAM
AMD's Data Center GPU organization is transforming the industry with our AI based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and joining our Data Center GPU organization where we are building amazing AI powered products with amazing people.
THE ROLE:
We are seeking an experienced
HPC Systems Engineer
with
7+ years of expertise in high-performance computing (HPC)
environments. This role requires hands-on experience with
Python, Kubernetes (K8s), Slurm, OpenStack, and Ansible
, along with the ability to
support external clients in live troubleshooting sessions.
The PERSON:
The ideal candidate will have deep technical knowledge of
drivers, troubleshooting methods, and system-level debugging
and will play a key role in managing, optimizing, and troubleshooting
HPC clusters and cloud-based HPC environments.
###
###
KEY RESPONSIBILITIES:
###
HPC System Administration & Troubleshooting
Manage and optimize HPC clusters, ensuring high availability and performance.
Troubleshoot GPU, CPU, network drivers, firmware, and OS-level issues.
Debug storage, networking, and job scheduling bottlenecks in Slurm-based environments.
###
Kubernetes & Cloud HPC Environments
Deploy and manage HPC workloads in Kubernetes for AI/ML and parallel computing.
Optimize OpenStack-based HPC clusters with Ceph, Cinder, and Neutron for cloud scalability.
Implement containerized HPC workflows using Kubernetes and OpenShift.
###
Automation & Infrastructure As Code (IaC)
Develop Ansible and Terraform scripts for provisioning and managing HPC resources.
Automate job scheduling, cluster monitoring, and log analysis using Python.
Optimize CI/CD pipelines for HPC and AI/ML applications.
###
Performance Tuning & Benchmarking
Benchmark and optimize multi-node HPC workloads (MPI, NCCL, ROCm, CUDA).
Tune OS parameters, networking (InfiniBand, RoCE), and Slurm configurations for peak performance.
Enhance HPC storage performance (Ceph, Lustre, NFS) and distributed computing efficiency.
###
Client Support & Collaboration
Provide real-time technical support and troubleshooting for HPC users.
Engage with developers, DevOps, and system administrators to optimize cluster performance.
Document solutions, best practices, and contribute to internal knowledge bases.
###
PREFERRED QUALIFICATIONS:
Experience with AMD MI300, MI2X0 GPUs, ROCm, MPI, UCX, or XPMEM.
Exposure to containerized workloads using Singularity or Docker in HPC.
Familiarity with OpenStack deployment automation (e.g., TripleO, Kolla, or OpenStack-Ansible).
Experience in customer-facing technical roles, with a strong ability to troubleshoot live issues.
This role is critical in ensuring seamless HPC operations, troubleshooting complex system issues, and supporting high-profile clients with real-time problem resolution in both bare-metal and cloud-based HPC environments.
ACADEMIC CREDENTIALS:
Bachelor or Masters Degree in Computer Engineering or Electrical/Electronics Engineering
#LI-PK1
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.
Job Detail
Job Id
JD3741317
Industry
Not mentioned
Total Positions
1
Job Type:
Full Time
Salary:
Not mentioned
Employment Status
Permanent
Job Location
TS, IN, India
Education
Not mentioned
Experience
Year
Apply For This Job
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.