Next-gen Kaldi: Text-to-speech (TTS)
Reference Audio for Voice Cloning
Provide a reference audio clip. The generated speech will clone the voice from this audio.
This space shows how to convert text to speech with Next-gen Kaldi.
It is running on CPU within a docker container provided by Hugging Face.
Voice Cloning: Select "Voice Cloning" language to use voice cloning models:
- Pocket TTS: Supports 6 languages (English, French, German, Portuguese, Italian, Spanish). Only requires a reference audio clip.
- ZipVoice: Supports Chinese and English. Requires both a reference audio clip and the exact text spoken in the reference audio.
You need to provide a reference audio clip (upload, record, or URL) to clone the voice.
See more information by visiting the following links:
- https://github.com/k2-fsa/sherpa-onnx
- https://github.com/kyutai-labs/pocket-tts
- https://github.com/k2-fsa/zipvoice
- https://k2-fsa.github.io/sherpa/onnx/tts/pocket.html
- https://k2-fsa.github.io/sherpa/onnx/tts/zipvoice.html
If you want to deploy it locally, please see https://k2-fsa.github.io/sherpa/
If you want to use Android APKs, please see https://k2-fsa.github.io/sherpa/onnx/tts/apk.html
If you want to use Android text-to-speech engine APKs, please see https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html
If you want to download an all-in-one exe for Windows, please see https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models
See also https://k2-fsa.github.io/sherpa/onnx/tts/all/ for models with audio samples.