waveglow: Gradient overflow when training
I am training a Waveglow model from scratch with 8k sampling rate dataset and got this error after 1 epoch
Since Pytorch havent fully support RTX GPUs yet, I have to use my old 1050ti and set batch_size to 1; this is my config.json
. Is batch_size too small cause the problem or I am using wrong audio params?
{ “train_config”: { “fp16_run”: true, “output_directory”: “checkpoints”, “epochs”: 100000, “learning_rate”: 1e-4, “sigma”: 1.0, “iters_per_checkpoint”: 2000, “batch_size”: 1, “seed”: 1234, “checkpoint_path”: “”, “with_tensorboard”: true }, “data_config”: { “training_files”: “train_files.txt”, “segment_length”: 16000, “sampling_rate”: 8000, “filter_length”: 1024, “hop_length”: 256, “win_length”: 1024, “mel_fmin”: 0.0, “mel_fmax”: 4000.0 }, “dist_config”: { “dist_backend”: “nccl”, “dist_url”: “tcp://localhost:54321” },
"waveglow_config": {
"n_mel_channels": 80,
"n_flows": 18,
"n_group": 8,
"n_early_every": 4,
"n_early_size": 2,
"WN_config": {
"n_layers": 4,
"n_channels": 256,
"kernel_size": 3
}
}
}
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 18
I noticed that sigma is set to 1 in
config.json
but it is 0.666 in tacotron2 inference file. Is it suppose to be like that? or they have to be the same value? moreover, what is sigma?@EuphoriaCelestial I picked something similar to the original; The original is 16000
segment_length
for a 22.05Khz sample rate audio file.So I know that the original WaveGlow worked well with segments a little over half a second long. 6144 is a little over half of 8000, and 6144 can be divided by the
hop_length
of 256 so there’s no extra padding.It does not have to be perfect, but too small makes it hard to learn low frequencies and multiples of the
hop_length
waste less compute.@CookiePPP cool, thank you for the knowledge!
@EuphoriaCelestial I would use a little over half again (and multiple of hop_length) between 8192 and 12288 would be cool. You can use anything you want, but just don’t make it too small is the main thing.
https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechSynthesis/Tacotron2/waveglow/arg_parser.py#L53
The DeepLearningExamples version uses
segment_length
of 4000 for 22.05Khz, so it seems to be all over the place.…
https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechSynthesis/Tacotron2/scripts/train_waveglow.sh
or it uses 8000
segment_length
?I have no idea what’s considered normal. I use half a second with my models and it works good enough for me.
okay, I will wait and report later
@EuphoriaCelestial loss scale around 256 is normal. I worry once it goes under 64.
@EuphoriaCelestial With a sampling_rate of 8000 your segment length is 2 seconds long. I don’t know what dataset you use but you might be training on a lot of padded data (files under 2 seconds long will be padded with zeros). You should probably decrease segment length to 6144 or something along those lines and increase batch_size to
4
.just need to match Tacotron2 and you’re good.
You have 18 flows. You start with 8 channels, and every 4 flows you output 2 of the channels.
At flow 0 you have 8 channels
You cannot have 0 channels…
I’m pretty sure this config is not the config you used, as I don’t think this config can start.