It Engineer 4

Year    KA, IN, India

Job Description

Requisition ID

191384


Date posted

01/05/2026


Work Location Model

Pending Selection


Work Location

Bangalore-IN-Bangalore


Work Country

India

The group you'll be a part of


---------------------------------



The Global Information Systems Group is dedicated to the success of Lam through providing best-in-class and innovative information system solutions and services. Together, we support users globally with data, information, and systems to achieve their business objectives.

The impact you'll make


--------------------------



We are seeking a

HPC Systems Engineer

to lead the evaluation, deployment, and ongoing management of our large-scale CPU and GPU-clustered environments. You will be the technical owner for the HPC system lifecycle--from initial hardware planning and installation to advanced performance tuning and troubleshooting. This role is highly collaborative, requiring you to work closely with Networking and Security teams to build a secure, high-speed foundational infrastructure that supports mission-critical research and engineering workloads.

What you'll do


------------------


Cluster Lifecycle Management:

Lead the evaluation, planning, configuration, and physical/virtual deployment of multiple large-scale

CPU + GPU

clusters.

System Administration:

Perform expert-level

Linux system administration

, including kernel tuning, security hardening, and OS lifecycle management (e.g., RHEL, Ubuntu, or Rocky Linux).

Workload Management:

Act as the subject matter expert for

SLURM

, managing complex partitioning, resource quality of service (QoS), and scheduling optimization for mixed workloads.

Infrastructure Design:

Architect and build the physical and logical infrastructure for HPC, including high-speed fabric integration (InfiniBand/Ethernet) and power/cooling planning.

Software Stack & Modules:

Maintain and curate the HPC application stack using software management tools like

LMOD

or Tcl Modules, ensuring researchers have access to optimized compilers, libraries (MPI, CUDA), and applications.

GPU Optimization:

Spec and tune GPU environments (e.g., NVIDIA H100/B200), focusing on GPUDirect, NVLink topologies, and containerized runtimes like Apptainer/Singularity.

Troubleshooting & Performance:

Conduct deep-dive root cause analysis for complex system failures and performance bottlenecks across compute, network, and software layers.

Cross-Functional Leadership:

Closely own infrastructure projects by coordinating with

Networking

(low-latency fabric) and

Security

(compliance, identity management) to ensure all builds meet enterprise standards.

Who we're looking for


-------------------------


Experience with GPU-aware MPI implementations and performance profiling tools (e.g., NVIDIA Nsight, Tau). Knowledge of container orchestration in HPC (e.g., Kubernetes for AI/ML workloads alongside SLURM). Certifications such as RHCE (Red Hat Certified Engineer) or relevant NVIDIA/InfiniBand technical training.

Preferred qualifications


----------------------------


Education:

BS/MS in Computer Science, Electrical Engineering, or a related field.

HPC Experience:

6+ years of hands-on experience managing production-grade HPC clusters.

Scheduler Expertise:

Deep proficiency in

SLURM

administration, including writing custom prolog/epilog scripts and managing GRES (Generic Resources) for GPUs.

Linux Mastery:

Advanced knowledge of Linux internals, shell scripting (Bash), and at least one high-level language (Python or Go).

Automation:

Extensive experience with configuration management and provisioning tools (e.g.,

Ansible

, Terraform, xCAT, or Warewulf).

Networking:

Familiarity with HPC-specific networking such as

InfiniBand

(NDR/HDR) and RoCE v2.

Our commitment


------------------



We believe it is important for every person to feel valued, included, and empowered to achieve their full potential. By bringing unique individuals and viewpoints together, we achieve extraordinary results.


Lam Research ("Lam" or the "Company") is an equal opportunity employer. Lam is committed to and reaffirms support of equal opportunity in employment and non-discrimination in employment policies, practices and procedures on the basis of race, religious creed, color, national origin, ancestry, physical disability, mental disability, medical condition, genetic information, marital status, sex (including pregnancy, childbirth and related medical conditions), gender, gender identity, gender expression, age, sexual orientation, or military and veteran status or any other category protected by applicable federal, state, or local laws. It is the Company's intention to comply with all applicable laws and regulations. Company policy prohibits unlawful discrimination against applicants or employees.

Lam offers a variety of work location models based on the needs of each role. Our hybrid roles combine the benefits of on-site collaboration with colleagues and the flexibility to work remotely and fall into two categories - On-site Flex and Virtual Flex. 'On-site Flex' you'll work 3+ days per week on-site at a Lam or customer/supplier location, with the opportunity to work remotely for the balance of the week. 'Virtual Flex' you'll work 1-2 days per week on-site at a Lam or customer/supplier location, and remotely the rest of the time.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD5068579
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    KA, IN, India
  • Education
    Not mentioned
  • Experience
    Year