Machine Learning Engineer (audio & Video Models)

Year    MH, IN, India

Job Description

Key Responsibilities




Design, train, and optimize audio and video ML models

, including classification, detection, segmentation, generative models, speech processing, and multimodal architectures. Develop and maintain

data pipelines

for large-scale audio/video datasets, ensuring quality, labeling consistency, and efficient ingestion. Implement

model evaluation frameworks

that measure robustness, latency, accuracy, and overall performance across real-world conditions. Work with product teams to transform research prototypes into

production-ready models

with reliable inference performance. Optimize models for

scalability, low latency, and edge/cloud deployment

, including quantization, pruning, and hardware-aware tuning. Collaborate with cross-functional teams to define technical requirements and experiment roadmaps. Monitor and troubleshoot production models, ensuring reliability and continuous improvement. Stay current with trends in

deep learning, computer vision, speech processing, and multimodal AI

.


Required Qualifications



Bachelor's or Master's degree in Computer Science, Electrical Engineering, Machine Learning, or a related field (PhD a plus). Strong experience with

deep learning frameworks

such as PyTorch or TensorFlow. Proven experience training and deploying

audio or video models

, such as: Speech recognition, speech enhancement, speaker identification Audio classification, event detection Video classification, action recognition, tracking Video-to-text, lip reading, multimodal fusion models Solid understanding of

neural network architectures

(CNNs, RNNs, Transformers, diffusion models, etc.). Proficiency in

Python

, along with ML tooling for experimentation and production (e.g., NumPy, OpenCV, FFmpeg, PyTorch Lightning). Experience working with

GPU/TPU environments

, distributed training, and model optimization. Ability to write clean, maintainable production-quality code.


Preferred Qualifications



Experience with

foundation models

or

multimodal transformers

(e.g., audio-language, video-language). Background in

signal processing

, feature extraction (MFCCs, spectrograms), or codec-level audio/video understanding. Experience with

MLOps tools

(e.g., MLflow, Weights & Biases, Kubeflow, Airflow). Knowledge of

cloud platforms

(AWS, GCP, Azure) and scalable model serving frameworks. Experience with

real-time audio/video processing

for streaming applications. Publications, open-source contributions, or competitive ML achievements are a plus.

Experience:



Min 2 years

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD5039478
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    MH, IN, India
  • Education
    Not mentioned
  • Experience
    Year