Lead Ai Infrastructure Engineer

Year    Remote, IN, India

Job Description

This role is for one of our clients



Industry: Technology, Information and Media
Seniority level: Mid-Senior level

Min Experience: 6 years
Location: Remote (India)
JobType: full-time
We are seeking a highly skilled

Lead AI Infrastructure Engineer

to drive the development and management of our AI and ML infrastructure. This role blends technical leadership with hands-on execution, overseeing the end-to-end ML lifecycle -- from model training and deployment to monitoring, optimization, and scaling. You will lead a small team of engineers while ensuring seamless collaboration between research, engineering, and operations teams.

Key Responsibilities

ML Infrastructure & Lifecycle Management


Design, maintain, and optimize scalable infrastructure for ML training, inference, and experimentation.
Ensure model deployment pipelines are reliable, efficient, and cost-effective.
Implement robust monitoring, alerting, and automated rollback mechanisms to maintain system reliability.

Collaboration with Research & Product Teams


Partner with research teams to streamline workflows for training, evaluation, and fine-tuning of models.
Support AI-driven initiatives across product teams by providing reliable infrastructure and operational expertise.

Team Leadership & Mentorship


Lead a small team of ML engineers, providing guidance, mentoring, and technical support.
Balance hands-on engineering work with strategic oversight of infrastructure projects.

Performance & Optimization


Enhance model inference latency, throughput, and cost-efficiency.
Apply model optimization techniques such as quantization, distillation, and TensorRT integration.

Automation & Best Practices


Develop and enforce CI/CD practices for ML models, including versioning, testing, and deployment.
Establish MLOps standards and operational excellence across teams.

Cloud & Platform Management


Leverage cloud-based ML platforms (AWS SageMaker, GCP Vertex AI, Azure ML) to optimize workflows and costs.
Maintain secure, compliant, and scalable AI environments for both training and inference workloads.

Architecture & Strategy


Contribute to ML architecture design, documentation, and roadmap planning.
Continuously evaluate emerging AI infrastructure technologies to improve efficiency and performance.

Qualifications & Skills


5+ years of hands-on experience in MLOps, ML Engineering, or AI Infrastructure roles.
Strong understanding of ML/DL concepts with applied experience in model training and deployment.
Proficiency with cloud-native ML platforms: AWS SageMaker, GCP Vertex AI, or Azure ML.
Experience with Kubernetes, Docker, MLflow, Kubeflow, or similar orchestration tools.
Familiarity with model optimization techniques: quantization, distillation, TensorRT, FasterTransformer.
Proven ability to lead technical projects and mentor engineers in a fast-paced environment.
Excellent communication and cross-functional collaboration skills.
Ownership-driven mindset and ability to bring clarity to ambiguous technical challenges.

Core Skills


MLOps u007C ML Infrastructure u007C Model Deployment u007C Model Monitoring u007C CI/CD for ML u007C Cloud ML Platforms u007C Kubernetes u007C Docker u007C Vertex AI u007C AWS SageMaker u007C Kubeflow u007C MLflow u007C Model Optimization

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD4449198
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    Remote, IN, India
  • Education
    Not mentioned
  • Experience
    Year