NeMo: Cannot fine-tune a Hifigan model
Describe the bug
I tried to fine tune the hifigan model according to FastPitch_Finetuning.ipynb
, but I encountered the following error:
Traceback (most recent call last):
File "examples/tts/hifigan_finetune.py", line 28, in main
trainer.fit(model)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1311, in _run_train
self._run_sanity_check(self.lightning_module)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1375, in _run_sanity_check
self._evaluation_loop.run()
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 110, in advance
dl_outputs = self.epoch_loop.run(dataloader, dataloader_idx, dl_max_batches, self.num_dataloaders)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
output = self._evaluation_step(batch, batch_idx, dataloader_idx)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 217, in _evaluation_step
output = self.trainer.accelerator.validation_step(step_kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/accelerators/accelerator.py", line 236, in validation_step
return self.training_type_plugin.validation_step(*step_kwargs.values())
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/ddp.py", line 444, in validation_step
return self.model(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/pytorch_lightning/overrides/base.py", line 92, in forward
output = self.module.validation_step(*inputs, **kwargs)
File "/root/NeMo/nemo/collections/tts/models/hifigan.py", line 226, in validation_step
audio, audio_len, audio_mel = batch
ValueError: not enough values to unpack (expected 3, got 2)
Steps/Code to reproduce bug
Generate the mel files according to the notebook and run python examples/tts/hifigan_finetune.py --config_name=hifigan.yaml model.train_ds.dataloader_params.batch_size=32 model.max_steps=1000 ~model.sched model.optim.lr=0.0001 train_dataset=./hifigan_train_ft.json validation_datasets=./hifigan_val_ft.json exp_manager.exp_dir=hifigan_ft +init_from_nemo_model=tts_hifigan.nemo trainer.check_val_every_n_epoch=10 model/train_ds=train_ds_finetune
Expected behavior
A clear and concise description of what you expected to happen.
Environment overview (please complete the following information)
If NVIDIA docker image is used you don’t need to specify these. Otherwise, please provide:
- Environment location: Cloud: AutoDL
- Method of NeMo install: From source,
git clone
andpip install .[tts]
Environment details
If NVIDIA docker image is used you don’t need to specify these. Otherwise, please provide: PyTorch 1.10.0, Python 3.8, Cuda 11.3
Additional context
Add any other context about the problem here.
GPU: RTX 3090 * 2 VRAM: 24GB CPU: 30 Cores AMD EPYC 7543 32-Core Processor RAM: 160GB
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 33 (16 by maintainers)
@Oktai15 Sorry for the late reply. It is now at epoch 14, and has produced a checkpoint. No error. Looks like it works.