TTS: multiband_melgan Vocoder Fails on Step 10000 With KeyError: 'avg_loss_0'

Describe the bug

multiband_melgan vocoder trained on en, specifically, "by_book/male/elliot_miller/pirates_of_ersatz/" (tried another language, failed with the same error), fails on step 10000 with KeyError: ‘avg_loss_0’.

To Reproduce

Run train_multiband_melgan.py on en_US dataset.

Expected behavior

No Errors.

Logs

No response

Environment

{
    "CUDA": {
        "GPU": [
            "Tesla T4",
            "Tesla T4",
            "Tesla T4",
            "Tesla T4"
        ],
        "available": true,
        "version": "10.2"
    },
    "Packages": {
        "PyTorch_debug": false,
        "PyTorch_version": "1.11.0+cu102",
        "TTS": "0.6.2",
        "numpy": "1.21.6"
    },
    "System": {
        "OS": "Linux",
        "architecture": [
            "64bit",
            "ELF"
        ],
        "processor": "x86_64",
        "python": "3.7.13",
        "version": "#1 SMP Tue Apr 26 20:14:22 UTC 2022"
    }
}

Additional context

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

haven’t checked yet and I won’t be able to at least for the next two weeks, unfortunately.

If there is a fix shoot a PR

there are two issues one in trainer target_loss is not present in keep_avg_target.avg_values

    def _pick_target_avg_loss(self, keep_avg_target: KeepAverage) -> Dict:
        """Pick the target loss to compare models"""
        target_avg_loss = None

        # return if target loss defined in the model config
        if "target_loss" in self.config and self.config.target_loss:
            if f"avg_{self.config.target_loss}" in keep_avg_target.avg_values.keys():
                return keep_avg_target[f"avg_{self.config.target_loss}"]
            else:
                print(keep_avg_target.avg_values.keys())
                return keep_avg_target[f"avg_loss_1"]

another in gan.py _log output[0] is coming as None

def _log(name: str, ap: AudioProcessor, batch: Dict, outputs: Dict) -> Tuple[Dict, Dict]:
        """Logging shared by the training and evaluation.

        Args:
            name (str): Name of the run. `train` or `eval`,
            ap (AudioProcessor): Audio processor used in training.
            batch (Dict): Batch used in the last train/eval step.
            outputs (Dict): Model outputs from the last train/eval step.

        Returns:
            Tuple[Dict, Dict]: log figures and audio samples.
        """
        if outputs[0] == None:
            y_hat = outputs[1]["model_outputs"]
        else:
            y_hat = outputs[0]["model_outputs"]
        y = batch["waveform"]
        figures = plot_results(y_hat, y, ap, name)
        sample_voice = y_hat[0].squeeze(0).detach().cpu().numpy()
        audios = {f"{name}/audio": sample_voice}
        return figures, audios

i have put a workaround for my usecase