Design, train, and optimize audio and video ML models
, including classification, detection, segmentation, generative models, speech processing, and multimodal architectures.
Develop and maintain
data pipelines
for large-scale audio/video datasets, ensuring quality, labeling consistency, and efficient ingestion.
Implement
model evaluation frameworks
that measure robustness, latency, accuracy, and overall performance across real-world conditions.
Work with product teams to transform research prototypes into
production-ready models
with reliable inference performance.
Optimize models for
scalability, low latency, and edge/cloud deployment
, including quantization, pruning, and hardware-aware tuning.
Collaborate with cross-functional teams to define technical requirements and experiment roadmaps.
Monitor and troubleshoot production models, ensuring reliability and continuous improvement.
Stay current with trends in
deep learning, computer vision, speech processing, and multimodal AI
.
Required Qualifications
Bachelor's or Master's degree in Computer Science, Electrical Engineering, Machine Learning, or a related field (PhD a plus).
Strong experience with
deep learning frameworks
such as PyTorch or TensorFlow.
Proven experience training and deploying
audio or video models
, such as: Speech recognition, speech enhancement, speaker identification
Audio classification, event detection
Video classification, action recognition, tracking
Video-to-text, lip reading, multimodal fusion models
Solid understanding of
neural network architectures
(CNNs, RNNs, Transformers, diffusion models, etc.).
Proficiency in
Python
, along with ML tooling for experimentation and production (e.g., NumPy, OpenCV, FFmpeg, PyTorch Lightning).
Experience working with
GPU/TPU environments
, distributed training, and model optimization.
Ability to write clean, maintainable production-quality code.
Preferred Qualifications
Experience with
foundation models
or
multimodal transformers
(e.g., audio-language, video-language).
Background in
signal processing
, feature extraction (MFCCs, spectrograms), or codec-level audio/video understanding.
Experience with
MLOps tools
(e.g., MLflow, Weights & Biases, Kubeflow, Airflow).
Knowledge of
cloud platforms
(AWS, GCP, Azure) and scalable model serving frameworks.
Experience with
real-time audio/video processing
for streaming applications.
Publications, open-source contributions, or competitive ML achievements are a plus.
Experience:
Min 2 years
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.