🎙️ MOSS-TTS Local Transformer v1.5

Multilingual (31 languages) stereo 48 kHz text-to-speech with zero-shot voice cloning, powered by OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5. Upload a short reference clip to clone a voice, or leave it empty for a default voice.

Language tag
Tagging the language improves quality in v1.5.
Examples