Job Title: AI Systems Engineer - GPU/ROCm/CUDA | ML Frameworks Optimization
Location:
Hyderabad
Experience :
3-6 [Mid-Senior]
:
We are looking for a passionate and experienced
AI Systems Engineer
to join our team to work on next-generation Machine Learning technologies and optimize performance across AMD GPU accelerators. This role involves low-level GPU programming, custom ML kernel development, and working with state-of-the-art inference engines.
Key Responsibilities:
Develop and optimize custom
Deep Learning GPU kernels
using
ROCm/CUDA
or shader languages
Support and enhance
ML model deployment
on
Linux platforms
Optimize performance of
ROCm drivers
and inferencing engines for
AI/ML workloads
Collaborate closely with internal hardware/software teams to support
next-gen GPU accelerators
Profile, debug, and improve performance of
GPU kernels and AI model pipelines
Contribute to designing and implementing new
AI technologies
and workflows
Required Skills & Qualifications:
BS/MS in Computer Science, Electrical Engineering
, or equivalent
Strong programming skills in
C/C++
,
Python
Solid experience working with
Linux CLI
,
bash scripting
, or
PowerShell
Hands-on experience with
Python ML libraries
such as
PyTorch
,
Transformers
Knowledge of writing high-performance ML kernels using
Triton
,
JAX
, or similar
Experience with
debugging tools
like gdb, valgrind, and
profiling tools
such as nsys, rocprof
Familiarity with AI inferencing runtimes such as
vllm
,
ollama
,
llama.cpp
, or
sglang
Understanding of
GPU and PC architecture
,
x86/x64 instruction sets
Experience developing with
ROCm
,
CUDA
, or shader programming
Nice to Have:
Knowledge of
x86 Assembly
Contributions to
open-source ML/DL performance libraries
Exposure to compiler optimization techniques for GPU code
What We Offer:
Work on cutting-edge GPU technologies and ML systems
Exposure to performance-critical AI workloads
Collaborative and research-oriented environment
Competitive compensation and career growth opportunities
Apply:
If you are looking for job change share your updated resume to
vagdevi@semi-leaf.com
Job Type: Full-time
Pay: Up to ₹3,000,000.00 per year
Experience:
Deep Learning GPU kernels using ROCm/CUDA: 2 years (Required)
programming skills in C/C++, Python: 1 year (Required)
Python ML libraries such as PyTorch, Transformers: 1 year (Required)
developing with ROCm, CUDA, : 1 year (Required)
Work Location: In person
Speak with the employer
+91 7483459258
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.