Smts Systems Design Eng.

Year    TS, IN, India

Job Description

SMTS Systems Design Eng.


============================

Hyderabad, India Engineering 62878


-------------------





WHAT YOU DO AT AMD CHANGES EVERYTHING



We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

AMD together we advance_







THE TEAM




AMD's Data Center GPU organization is transforming the industry with our AI based Graphic Processors. Our primary objective is to design exceptional products that drive the evolution of computing experiences, serving as the cornerstone for enterprise Data Centers, (AI) Artificial Intelligence, HPC and Embedded systems. If this resonates with you, come and joining our Data Center GPU organization where we are building amazing AI powered products with amazing people.

THE ROLE:




We are seeking an experienced

HPC Systems Engineer

with

7+ years of expertise in high-performance computing (HPC)

environments. This role requires hands-on experience with

Python, Kubernetes (K8s), Slurm, OpenStack, and Ansible

, along with the ability to

support external clients in live troubleshooting sessions.



The PERSON:




The ideal candidate will have deep technical knowledge of

drivers, troubleshooting methods, and system-level debugging

and will play a key role in managing, optimizing, and troubleshooting

HPC clusters and cloud-based HPC environments.



###

###

KEY RESPONSIBILITIES:



###

HPC System Administration & Troubleshooting



Manage and optimize HPC clusters, ensuring high availability and performance. Troubleshoot GPU, CPU, network drivers, firmware, and OS-level issues. Debug storage, networking, and job scheduling bottlenecks in Slurm-based environments.
###

Kubernetes & Cloud HPC Environments



Deploy and manage HPC workloads in Kubernetes for AI/ML and parallel computing. Optimize OpenStack-based HPC clusters with Ceph, Cinder, and Neutron for cloud scalability. Implement containerized HPC workflows using Kubernetes and OpenShift.
###

Automation & Infrastructure As Code (IaC)



Develop Ansible and Terraform scripts for provisioning and managing HPC resources. Automate job scheduling, cluster monitoring, and log analysis using Python. Optimize CI/CD pipelines for HPC and AI/ML applications.
###

Performance Tuning & Benchmarking



Benchmark and optimize multi-node HPC workloads (MPI, NCCL, ROCm, CUDA). Tune OS parameters, networking (InfiniBand, RoCE), and Slurm configurations for peak performance. Enhance HPC storage performance (Ceph, Lustre, NFS) and distributed computing efficiency.
###

Client Support & Collaboration



Provide real-time technical support and troubleshooting for HPC users. Engage with developers, DevOps, and system administrators to optimize cluster performance. Document solutions, best practices, and contribute to internal knowledge bases.
###

PREFERRED QUALIFICATIONS:



Experience with AMD MI300, MI2X0 GPUs, ROCm, MPI, UCX, or XPMEM. Exposure to containerized workloads using Singularity or Docker in HPC. Familiarity with OpenStack deployment automation (e.g., TripleO, Kolla, or OpenStack-Ansible). Experience in customer-facing technical roles, with a strong ability to troubleshoot live issues.

This role is critical in ensuring seamless HPC operations, troubleshooting complex system issues, and supporting high-profile clients with real-time problem resolution in both bare-metal and cloud-based HPC environments.

ACADEMIC CREDENTIALS:



Bachelor or Masters Degree in Computer Engineering or Electrical/Electronics Engineering


#LI-PK1

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD3741317
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    TS, IN, India
  • Education
    Not mentioned
  • Experience
    Year