StyleSpeech: time dimension doesn't match

^MTraining: 0%| | 0/200000 [00:00<?, ?it/s] ^MEpoch 1: 0%| | 0/454 [00:00<?, ?it/s]^[[APrepare training … Number of StyleSpeech Parameters: 28197333 Removing weight norm… Traceback (most recent call last): File “train.py”, line 224, in <module> main(args, configs) File “train.py”, line 98, in main output = (None, None, model((batch[2:-5]))) File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl result = self.forward(*input, **kwargs) File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py”, line 165, in forward return self.module(*inputs[0], **kwargs[0]) File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl result = self.forward(*input, **kwargs) File “/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py”, line 144, in forward d_control, File “/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py”, line 88, in G d_control, File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl result = self.forward(*input, **kwargs) File “/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/modules.py”, line 417, in forward x = x + pitch_embedding RuntimeError: The size of tensor a (132) must match the size of tensor b (130) at non-singleton dimension 1 ^MTraining: 0%| | 1/200000 [00:02<166:02:12, 2.99s/it]

I think it might because of mfa I used. As mentioned in https://montreal-forced-aligner.readthedocs.io/en/latest/getting_started.html, I installed mfa through conda.

Then I used mfa align raw_data/LibriTTS lexicon/librispeech-lexicon.txt english preprocessed_data/LibriTTS instead of the way you showed. But I can’t find a way to run it as the way you showed, because I installed mfa through conda.

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 1
Comments: 24 (11 by maintainers)

Most upvoted comments

gotcha, I should mention this first, you have to modify /text as in your case where the target language is not English. In the current code, the output of text_to_sequence function is different from the MFA output based on ‘raw_data/mls/ german-lexicon.txt’. To resolve this, you have to match the output of both functions. This is also important at inference time, where we will use the same function in /text.

keonlee9420 on Feb 9, 2022

exactly. The missing phonemes must also be missed here, which is the part you must modify along with your languages. Again, you need to make sure that the output of text_to_sequence function should always be matched with the TextGrid’s phoneme sequence (MFA lexicons).

keonlee9420 on Feb 9, 2022