TTS: [Bug] Glow-TTS, Trained in multi GPU get KeyError: 'avg_loss'

Hi! When I follow to recipes train a glow-tts, I get this error

 ! Run is kept in /workspace/tts/glow_tts/glow_tts_chinese-September-20-2021_02+44PM-0000000
Traceback (most recent call last):
  File "/workspace/TTS/TTS/trainer.py", line 919, in fit
    self._fit()
  File "/workspace/TTS/TTS/trainer.py", line 904, in _fit
    self.train_epoch()
  File "/workspace/TTS/TTS/trainer.py", line 738, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/workspace/TTS/TTS/trainer.py", line 685, in train_step
    target_avg_loss = self._pick_target_avg_loss(self.keep_avg_train)
  File "/workspace/TTS/TTS/trainer.py", line 957, in _pick_target_avg_loss
    target_avg_loss = keep_avg_target["avg_loss"]
  File "/workspace/TTS/TTS/utils/generic_utils.py", line 155, in __getitem__
    return self.avg_values[key]
KeyError: 'avg_loss'

and current_lr always 0.00000 image

My train file : train.py.txt

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (6 by maintainers)

Most upvoted comments

My Environment:

  • Docker : pytorch:1.9.0-cuda11.1-cudnn8-runtime
  • PyTorch and TensorFlow version : 1.9.0 & 2.5.0
  • Python version : 3.7.10
  • CUDA/cuDNN version 11.1/8
  • GPU model and memory : GeForce RTX 3090 x2