About the Role
We are looking for a Systems or Solutions Architect with deep expertise in networking, infrastructure-as-a-service (IaaS), and cloud-scale system design to help architect and optimize AI/ML infrastructure .
The ideal candidate combines strong fundamentals in cloud architecture (AWS or equivalent) , networking , compute , and storage , with hands-on experience in Kubernetes, observability, and automation .
You'll design scalable systems that support large AI workloads -- enabling efficient training, inference, and data pipelines across distributed environments.
Key Responsibilities
Architect and scale AI/ML infrastructure across public cloud (AWS / Azure / GCP) and hybrid environments.
Design and optimize compute, storage, and network topologies for distributed training and inference clusters.
Build and manage containerized environments using Kubernetes, Docker, and Helm .
Develop automation frameworks for provisioning, scaling, and monitoring infrastructure using Python, Go, and IaC (Terraform / CloudFormation) .
Partner with data science and ML Ops teams to align AI infrastructure requirements (GPU/CPU scaling, caching, throughput, latency).
Implement observability, logging, and tracing using Prometheus, Grafana, CloudWatch, or Open Telemetry .
Drive networking automation (BGP, routing, load balancing, VPNs, service meshes) using software-defined networking (SDN) and modern APIs.
Lead performance, reliability, and cost-optimization efforts for AI training and inference pipelines.
Collaborate cross-functionally with product, platform, and operations teams to ensure secure, performant, and resilient infrastructure .
Required Qualifications
Knowledge of AI/ML infrastructure patterns , including distributed training, inference pipelines, and GPU orchestration.
Bachelor's or Master's degree in Computer Science, Information Technology, or related field.
10+ years of experience in systems, infrastructure, or solutions architecture roles.
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.