Sre Infrastructure Support Engineer

Year    MH, IN, India

Job Description

SRE - Infrastructure Support Engineer - JD

We are hiring a "SRE [Site Reliability Engineer] Infrastructure Support" engineer with deep expertise in Linux,

Kubernetes, and hardware infrastructure management for our "Enterprise-grade high-performance

supercomputing" platform. We are helping enterprises and service providers build their AI inference platforms

for end users, powered by our state-of-the-art RDU (Reconfigurable Dataflow Unit) hardware architecture. This

is a high-impact, high-visibility role. The ideal candidate will play a pivotal role in supporting and maintaining

our enterprise infrastructure stack, ensuring high availability and optimal performance across mission-critical AI

& ML environments. This role involves close collaboration with global SRE and Platform teams to manage and

troubleshoot enterprise systems and clusters.

Location: Remote and open to traveling to KSA or Turkey for 1 year.

Exp: 10+ years

Key Responsibilities:

Linux Administration: Manage, configure, and optimize Linux servers (RHEL, Ubuntu, or similar),
including patching, security hardening, and performance tuning.

Kubernetes Administration: Deploy, manage, and troubleshoot Kubernetes clusters, ensuring
reliability and scalability.

Hardware Infrastructure Management: Oversee physical data center infrastructure, including servers,
storage, and networking hardware.

Security & Compliance: Apply security patches and upgrades for Linux-based Kubernetes
environments and ensure compliance with organizational policies.

Collaboration & Support: Work closely with SRE and Platform teams worldwide to support enterprise
systems and clusters.

Ticket-Based Case Management: Handle tickets efficiently using tools such as Salesforce or
ServiceNow.

Required Qualifications:

Strong hands-on experience with Linux system administration (RHEL, Ubuntu, or similar).
RHCSA/RHCE certification is a plus.

Solid understanding of Kubernetes administration; CKA/CKS certification is a plus. Hands-on experience with bare-metal and hardware infrastructure (servers, storage, networking). Good understanding of networking concepts (TCP/IP, DNS, Load Balancers, Firewalls); knowledge of
Juniper OS is a plus.

Strong troubleshooting skills across hardware, OS, and Kubernetes environments. Knowledge of automation tools such as Ansible, Python, Bash, or similar is a plus. Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK) is a plus.
Soft Skills:

Strong communication, problem-solving, and collaboration abilities. Ability to work effectively in fast-paced, dynamic environments and adapt to evolving AI & ML
technologies.

Proactive mindset with a focus on automation, scalability, and operational excellence.
Why Join Us:

Work on cutting-edge AI & ML infrastructure supporting mission-critical applications. Collaborate with global teams and gain exposure to advanced cloud-native and enterprise
technologies.

Opportunity to grow your expertise in Linux, Kubernetes, and data center operations
Job Type: Full-time

Pay: ₹200,000.00 - ₹1,540,374.87 per year

Benefits:

Provident Fund
Work Location: In person

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD4313402
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    MH, IN, India
  • Education
    Not mentioned
  • Experience
    Year