We are looking for a passionate AI Research Scientist with expertise in Computer Vision and Video Understanding to join our AI Research Team. You will work on developing deep learning-based solutions that help machines interpret, analyze, and understand complex video content.
The role focuses on applied research, model development, and prototyping of scalable AI solutions that can be deployed across enterprise use cases.
Key Responsibilities
Research and prototype deep learning models for video understanding, including:
Action and gesture recognition
Object tracking and temporal event detection
Video summarization and scene understanding
Deepfake and manipulated content detection
Implement and optimize architectures such as 3D CNNs, Vision Transformers, and Temporal Models.
Contribute to dataset design, annotation, and preprocessing pipelines for video-based training.
Conduct experiments to benchmark and improve model accuracy, robustness, and inference performance.
Collaborate with senior researchers and engineering teams to integrate research outputs into production systems.
Stay updated on the latest research in video AI, multimodal learning, and transformer architectures, and evaluate their practical applications.
Document methodologies, publish internal reports, and contribute to patents or external publications where applicable.
Required Qualifications
Master's degree or higher in Computer Science, Electrical Engineering, or related field.
3-7 years of experience in computer vision or deep learning research, with a focus on video data.
Strong programming skills in Python and experience with PyTorch or TensorFlow.
Hands-on experience with spatio-temporal models such as I3D, SlowFast, or TimeSformer.
Solid understanding of feature extraction, self-supervised learning, and transformer-based architectures.
Experience working with video datasets like Kinetics, Something-Something, or DFDC.
Proven ability to run experiments end-to-end -- from data preparation to evaluation and reporting.
Strong analytical thinking and communication skills.
Should have published papers or patents or invented new architectures for model enhancements.
Preferred Qualifications
Experience in multimodal AI (vision + audio + text) or generative video modeling.
Exposure to video forensics or media authenticity detection.
Experience with distributed training, GPU optimization, or MLOps pipelines.
Familiarity with open-source frameworks such as Hugging Face, OpenMMLab, or PyTorch Lightning.
Job Types: Full-time, Permanent
Pay: ?397,020.86 - ?1,745,814.57 per year
Work Location: In person
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.