NeMo: Cannot run Mixer TTS colab with Mixer-TTS-X model

Hi all. I’m struggling to run a training using Mixer-TTS-X model. I’m using the tutorial for training both Fastpitch and MixerTTS.

Modifications I’ve done:

pretrained_model = "tts_en_lj_mixerttsx"

Adding ‘raw_texts’ argument when generating a spectrogram:

spectrogram = spec_gen.generate_spectrogram(tokens=tokens, raw_texts=["Hey, this produces speech!"])

Correcting this:

from nemo.collections.tts.torch.data import MixerTTSXDataset

Just in case:

from nemo.collections.tts.torch.tts_data_types import LMTokens
from transformers.models.albert.tokenization_albert import AlbertTokenizer

add lm_tokenizer parameter here:

def pre_calculate_supplementary_data(sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_tokenizer, text_normalizer_call_kwargs)

Getting the right config file:

&& wget https://raw.githubusercontent.com/NVIDIA/NeMo/$BRANCH/examples/tts/conf/mixer-tts-x.yaml

Creating the lm_tokenizer object:

lm_tokenizer = LMTokens()

And after running the command:

mixer_tts_sup_data_path = "mixer_tts_x_sup_data_folder"
sup_data_types = ["align_prior_matrix", "pitch", "lm_tokens"]

pitch_mean, pitch_std, pitch_min, pitch_max = pre_calculate_supplementary_data(
    mixer_tts_sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_tokenizer, text_normalizer_call_kwargs
)

I get the following error:

[NeMo I 2022-08-24 22:00:27 data:216] Loading dataset from tests/data/asr/an4_train.json.
30it [00:00, 712.41it/s][NeMo I 2022-08-24 22:00:27 data:253] Loaded dataset with 30 files.
[NeMo I 2022-08-24 22:00:27 data:255] Dataset contains 0.02 hours.
[NeMo I 2022-08-24 22:00:27 data:357] Pruned 0 files. Final dataset contains 30 files
[NeMo I 2022-08-24 22:00:27 data:360] Pruned 0.00 hours. Final dataset contains 0.02 hours.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[<ipython-input-21-b87416bfd412>](https://localhost:8080/#) in <module>
      3 
      4 pitch_mean, pitch_std, pitch_min, pitch_max = pre_calculate_supplementary_data(
----> 5     mixer_tts_sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_tokenizer, text_normalizer_call_kwargs
      6 )

3 frames
[<ipython-input-20-7893d3034131>](https://localhost:8080/#) in pre_calculate_supplementary_data(sup_data_path, sup_data_types, text_tokenizer, text_normalizer, lm_tokenizer, text_normalizer_call_kwargs)
     22             text_normalizer=text_normalizer,
     23             lm_tokenizer=lm_tokenizer,
---> 24             text_normalizer_call_kwargs=text_normalizer_call_kwargs
     25         ) 
     26         stage2dl[stage] = torch.utils.data.DataLoader(ds, batch_size=1, collate_fn=ds._collate_fn, num_workers=1)

[/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/torch/data.py](https://localhost:8080/#) in __init__(self, **kwargs)
    759 class MixerTTSXDataset(TTSDataset):
    760     def __init__(self, **kwargs):
--> 761         super().__init__(**kwargs)
    762 
    763     def _albert(self):

[/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/torch/data.py](https://localhost:8080/#) in __init__(self, manifest_filepath, sample_rate, text_tokenizer, tokens, text_normalizer, text_normalizer_call_kwargs, text_tokenizer_pad_id, sup_data_types, sup_data_path, max_duration, min_duration, ignore_file, trim, trim_ref, trim_top_db, trim_frame_length, trim_hop_length, n_fft, win_length, hop_length, window, n_mels, lowfreq, highfreq, **kwargs)
    323 
    324         for data_type in self.sup_data_types:
--> 325             getattr(self, f"add_{data_type.name}")(**kwargs)
    326 
    327     @staticmethod

[/usr/local/lib/python3.7/dist-packages/nemo/collections/tts/torch/data.py](https://localhost:8080/#) in add_lm_tokens(self, **kwargs)
    785 
    786     def add_lm_tokens(self, **kwargs):
--> 787         lm_model = "kwargs.pop('lm_model')"
    788 
    789         if lm_model == "albert":

KeyError: 'lm_model'

Any ideas are welcome. Thanks,

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 27

Most upvoted comments

Oh, you should also be able to load the r1.11.0 version of the tutorial, which has the correct path for that version of the repository.

Ahh, I see the problem now. #4690 was merged to main, while #4811 was merged as a bugfix to the r1.11.0 branch. My apologies! In this case you will have to wait until the r1.11.0 branch is merged again with main, which should be within the next few days.

Alternatively you could try cherrypicking the fix in another branch, if it is urgent.

No problem!

I don’t think anyone on our team has tried it yet, but yes, you should be able to fine-tune as usual.