Ai Systems Engineer – Gpu/rocm/cuda | Ml Frameworks Optimization

Year    TS, IN, India

Job Description

Job Title: AI Systems Engineer - GPU/ROCm/CUDA | ML Frameworks Optimization



Location:

Hyderabad

Experience :

3-6 [Mid-Senior]

:



We are looking for a passionate and experienced

AI Systems Engineer

to join our team to work on next-generation Machine Learning technologies and optimize performance across AMD GPU accelerators. This role involves low-level GPU programming, custom ML kernel development, and working with state-of-the-art inference engines.

Key Responsibilities:



Develop and optimize custom

Deep Learning GPU kernels

using

ROCm/CUDA

or shader languages Support and enhance

ML model deployment

on

Linux platforms

Optimize performance of

ROCm drivers

and inferencing engines for

AI/ML workloads

Collaborate closely with internal hardware/software teams to support

next-gen GPU accelerators

Profile, debug, and improve performance of

GPU kernels and AI model pipelines

Contribute to designing and implementing new

AI technologies

and workflows

Required Skills & Qualifications:



BS/MS in Computer Science, Electrical Engineering

, or equivalent Strong programming skills in

C/C++

,

Python

Solid experience working with

Linux CLI

,

bash scripting

, or

PowerShell

Hands-on experience with

Python ML libraries

such as

PyTorch

,

Transformers

Knowledge of writing high-performance ML kernels using

Triton

,

JAX

, or similar Experience with

debugging tools

like gdb, valgrind, and

profiling tools

such as nsys, rocprof Familiarity with AI inferencing runtimes such as

vllm

,

ollama

,

llama.cpp

, or

sglang

Understanding of

GPU and PC architecture

,

x86/x64 instruction sets

Experience developing with

ROCm

,

CUDA

, or shader programming

Nice to Have:



Knowledge of

x86 Assembly

Contributions to

open-source ML/DL performance libraries

Exposure to compiler optimization techniques for GPU code

What We Offer:



Work on cutting-edge GPU technologies and ML systems Exposure to performance-critical AI workloads Collaborative and research-oriented environment Competitive compensation and career growth opportunities

Apply:

If you are looking for job change share your updated resume to

vagdevi@semi-leaf.com



Job Type: Full-time

Pay: Up to ₹3,000,000.00 per year

Experience:

Deep Learning GPU kernels using ROCm/CUDA: 2 years (Required) programming skills in C/C++, Python: 1 year (Required) Python ML libraries such as PyTorch, Transformers: 1 year (Required) developing with ROCm, CUDA, : 1 year (Required)
Work Location: In person

Speak with the employer


+91 7483459258

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD3982798
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    TS, IN, India
  • Education
    Not mentioned
  • Experience
    Year