TTS: [Bug] newer models performs much worse
Describe the bug
The updated version requires re-download of the checkpoints. However, they gives much worse voice cloning/synthesis than previous versions.
The same issue also mentioned in the Huggingface community discussion: https://huggingface.co/coqui/XTTS-v2/discussions/16
Is it possible to use the previous model checkpoints or use local downloaded model? The model loading of the package seems strange. I do not see any options to do so.
To Reproduce
import torch from TTS.api import TTS
Get device
device = “cuda” if torch.cuda.is_available() else “cpu”
List available 🐸TTS models
print(TTS().list_models())
Init TTS
tts = TTS(“tts_models/multilingual/multi-dataset/xtts_v2”).to(device)
Run TTS
❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language
Text to speech list of amplitude values as output
wav = tts.tts(text=“Hello world!”, speaker_wav=“my/cloning/audio.wav”, language=“en”)
Expected behavior
No response
Logs
No response
Environment
TTS 0.21.1
Python 3.10.11
Pytorch 2.1.0+cu121
Ubuntu 22.04
Additional context
No response
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Reactions: 1
- Comments: 15 (2 by maintainers)
The best way actually, create a folder named
model
at your script location and put these files inside:https://huggingface.co/coqui/XTTS-v2/raw/v2.0.2/config.json https://huggingface.co/coqui/XTTS-v2/resolve/v2.0.2/model.pth?download=true https://huggingface.co/coqui/XTTS-v2/raw/v2.0.2/vocab.json
And in your code load the TTS like that:
you can try to git clone the hf model and checkout the old commit
update: I saw it’s possible to load from local model: TTS/api.py Example loading a model from a path: >>> tts = TTS(model_path=“/path/to/checkpoint_100000.pth”, config_path=“/path/to/config.json”, progress_bar=False, gpu=False) >>> tts.tts_to_file(text=“Ich bin eine Testnachricht.”, file_path=“output.wav”)