focused on naturalness, expressiveness, latency, and robustness
Design and train TTS systems for
real-world voices
across accents, languages, and speaking styles
Improve
streaming and low-latency speech synthesis
pipelines
Experiment with architectures, loss functions, and data strategies (multi-speaker training, style modeling, distillation, data augmentation)
Translate research ideas into
production-ready TTS systems
Collaborate closely with infra, product, and voice engineering teams
What we're looking for
--------------------------
Strong background in
Text-to-Speech / speech generation research
Hands-on experience with deep learning frameworks (
PyTorch preferred
)
Experience with
real-time or low-latency TTS systems
Familiarity with modern TTS architectures (Tacotron-style, FastSpeech, VITS, diffusion-based, neural vocoders)
Ability to think end-to-end:
data model inference deployment
Prior work in
multilingual, expressive, or accented speech synthesis
is a strong plus
Great to have
-----------------
Publications in top speech / ML conferences
Experience deploying
TTS models in real-time production
Exposure to
conversational AI or voice agents
Years of Experience
-----------------------
3-6 years
of specialized experience in speech through academia or industry
Education
-------------
Master's or PhD in Speech, ML, or a related field
Note:
We often make exceptions and hire brilliant candidates regardless of years of experience or education.
Proof of work is paramount.
Compensation Range: $60K - $100K
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.