Develops, deploys, and maintains scalable machine learning (ML) models, including specialized focus on Large Language Models (LLMs) and other foundation models.
Designs and implements robust APIs and microservices for real-time model serving and inference, particularly focusing on optimizing latency and throughput for large generative models.
Implements MLOps practices (CI/CD) to automate the testing, deployment, and monitoring of ML systems, including specific pipelines for fine-tuning and customizing LLMs.
Optimizes model performance, latency, and resource consumption through techniques like quantization, pruning, and low-rank adaptation (LoRA) for efficient GenAI deplo yment.
Establishes comprehensive monitoring systems for tracking model health, detecting data drift, concept drift, and monitoring prompt quality and token usage in production GenAI environments.
Designs and implements ethical AI guardrails, content filters, and safety mechanisms to mitigate risks (e.g., bias, toxicity, and hallucination) associated with deployed GenAI applications.
Manages the underlying infrastructure (e.g., specialized compute clusters) necessary to support the intensive training and serving requirements for large-scale GenAI workloads.
Contributes to the engineering wiki, defining model versioning strategies, prompt management standards, and documenting deployment architectures and runbooks for GenAI systems.
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.