TTS: [Bug] newer models performs much worse

Describe the bug

The updated version requires re-download of the checkpoints. However, they gives much worse voice cloning/synthesis than previous versions.

The same issue also mentioned in the Huggingface community discussion: https://huggingface.co/coqui/XTTS-v2/discussions/16

Is it possible to use the previous model checkpoints or use local downloaded model? The model loading of the package seems strange. I do not see any options to do so.

To Reproduce

import torch from TTS.api import TTS

Get device

device = “cuda” if torch.cuda.is_available() else “cpu”

List available 🐸TTS models

print(TTS().list_models())

Init TTS

tts = TTS(“tts_models/multilingual/multi-dataset/xtts_v2”).to(device)

Run TTS

❗ Since this model is multi-lingual voice cloning model, we must set the target speaker_wav and language

Text to speech list of amplitude values as output

wav = tts.tts(text=“Hello world!”, speaker_wav=“my/cloning/audio.wav”, language=“en”)

Expected behavior

No response

Logs

No response

Environment

TTS 0.21.1
Python 3.10.11
Pytorch 2.1.0+cu121
Ubuntu 22.04

Additional context

No response

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Reactions: 1
  • Comments: 15 (2 by maintainers)

Most upvoted comments

The best way actually, create a folder named model at your script location and put these files inside:

https://huggingface.co/coqui/XTTS-v2/raw/v2.0.2/config.json https://huggingface.co/coqui/XTTS-v2/resolve/v2.0.2/model.pth?download=true https://huggingface.co/coqui/XTTS-v2/raw/v2.0.2/vocab.json

And in your code load the TTS like that:

model = TTS(model_path="model/", config_path="model/config.json").to(device)
``

I thought that my local model was corrupted and I deleted so now, I can’t use it it anymore. Hope this problem will be solved soon.

you can try to git clone the hf model and checkout the old commit

update: I saw it’s possible to load from local model: TTS/api.py Example loading a model from a path: >>> tts = TTS(model_path=“/path/to/checkpoint_100000.pth”, config_path=“/path/to/config.json”, progress_bar=False, gpu=False) >>> tts.tts_to_file(text=“Ich bin eine Testnachricht.”, file_path=“output.wav”)