TTS: Errors when trying to train SC-GlowTTS
Describe the bug
I am trying to train SC-GlowTTS model. I downloaded the config from the latest release and tried to launch TTS/bin/train_glow_tts.py
. However, I face different errors regarding the missing values in the config. First it was stats_path
, then use_noise_augment
and now I get AssertionError: 22050 vs 48000
, despite the fact that configs state “wav sample-rate. If different than the original data, it is resampled”. What is the proper way to train SC-GlowTTS? 😃
To Reproduce Steps to reproduce the behavior:
- Download and unzip SC-GlowTTS config from v0.0.13 release (https://github.com/coqui-ai/TTS/releases/download/v0.0.12/tts_models--en--vctk--sc-glowtts-transformer.zip)
- Download and unzip VCTK dataset e. g. from here (link from SC-GlowTTS repo)
- Substitute dataset path in config for yours
- Download and install glow TTS:
git clone https://github.com/coqui-ai/TTS && cd TTS && pip install -e .
- Execute with your config path from TTS directory:
python TTS/bin/train_glow_tts.py --config_path /path/to/config/
Expected behavior The model trains without errors
Environment (please complete the following information):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
- PyTorch or TensorFlow version (use command below): Pytorch 1.8.1
- Python version: Python 3.7.10
- CUDA/cuDNN version: CUDA 10.2 cuDNN 7.6.5
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (10 by maintainers)
@loganhart420 I recommend that you use the same speaker encoder used in the paper and available here (trained by 330k steps).
In SC-GlowTTS the quality of the speaker encoder is fundamental because it doesn’t receive any extra information from the speaker.
As your batch size is smaller, you should train more. In addition, in the article, we trained the model by 150k steps using the VCTK, which is much smaller and has only 108 speakers. So as you are training in a larger dataset, you need to train more steps.
Maybe @Edresson can help as the one who trained the models.
My take is that LibriTTS is a harder dataset and more difficult to reach the same quality.