StyleSpeech: time dimension doesn't match
^MTraining: 0%| | 0/200000 [00:00<?, ?it/s] ^MEpoch 1: 0%| | 0/454 [00:00<?, ?it/s]^[[APrepare training … Number of StyleSpeech Parameters: 28197333 Removing weight norm… Traceback (most recent call last): File “train.py”, line 224, in <module> main(args, configs) File “train.py”, line 98, in main output = (None, None, model((batch[2:-5]))) File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl result = self.forward(*input, **kwargs) File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py”, line 165, in forward return self.module(*inputs[0], **kwargs[0]) File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl result = self.forward(*input, **kwargs) File “/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py”, line 144, in forward d_control, File “/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/StyleSpeech.py”, line 88, in G d_control, File “/share/mini1/sw/std/python/anaconda3-2019.07/v3.7/envs/StyleSpeech/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl result = self.forward(*input, **kwargs) File “/share/mini1/res/t/vc/studio/timap-en/libritts/StyleSpeech/model/modules.py”, line 417, in forward x = x + pitch_embedding RuntimeError: The size of tensor a (132) must match the size of tensor b (130) at non-singleton dimension 1 ^MTraining: 0%| | 1/200000 [00:02<166:02:12, 2.99s/it]
I think it might because of mfa I used. As mentioned in https://montreal-forced-aligner.readthedocs.io/en/latest/getting_started.html, I installed mfa through conda.
Then I used
mfa align raw_data/LibriTTS lexicon/librispeech-lexicon.txt english preprocessed_data/LibriTTS
instead of the way you showed.
But I can’t find a way to run it as the way you showed, because I installed mfa through conda.
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 1
- Comments: 24 (11 by maintainers)
gotcha, I should mention this first, you have to modify
/textas in your case where the target language is not English. In the current code, the output oftext_to_sequencefunction is different from the MFA output based on ‘raw_data/mls/ german-lexicon.txt’. To resolve this, you have to match the output of both functions. This is also important at inference time, where we will use the same function in/text.exactly. The missing phonemes must also be missed here, which is the part you must modify along with your languages. Again, you need to make sure that the output of
text_to_sequencefunction should always be matched with the TextGrid’s phoneme sequence (MFA lexicons).