NeMo: Lots of "... contains unknown char/phoneme...Symbol will be skipped." warnings popping up.

I tried to train a fastpitch model on the Blizzard 2013 dataset. When I started the training, lots of such warnings kept popping up, for example,

[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [And then on her BRIY1FLIY0 IH0KSPREH1SIH0NG HHER1 sorrow for what HHIY1 must have suffered, HHIY1 replied,] contains unknown char/phoneme: [A].Original text: [And then on her briefly expressing her sorrow for what he must have suffered, he replied,]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [How different from what it was the last two dances.] contains unknown char/phoneme: [H].Original text: [How different from what it was the last two dances.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [which was VEH1RIY0 LIH1TAH0L relieved by the long speeches of MIH1STER0 Collins.] contains unknown char/phoneme: [C].Original text: [which was very little relieved by the long speeches of mister Collins.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [Henry is different. He LAH1VZ to be DUW1IH0NG.] contains unknown char/phoneme: [H].Original text: [Henry is different. He loves to be doing.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [Henry is different. He LAH1VZ to be DUW1IH0NG.] contains unknown char/phoneme: [H].Original text: [Henry is different. He loves to be doing.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [And if I HHAE1V good luck, your mother SHAE1L have some.] contains unknown char/phoneme: [A].Original text: [And if I have good luck, your mother shall have some.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [And if I HHAE1V good luck, your mother SHAE1L have some.] contains unknown char/phoneme: [I].Original text: [And if I have good luck, your mother shall have some.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [and I BIH0LIY1V there is SKEH1RSLIY0 a YAH1NG LEY1DIY0 in the United Kingdoms.] contains unknown char/phoneme: [I].Original text: [and I believe there is scarcely a young lady in the United Kingdoms.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [and I BIH0LIY1V there is SKEH1RSLIY0 a YAH1NG LEY1DIY0 in the United Kingdoms.] contains unknown char/phoneme: [U].Original text: [and I believe there is scarcely a young lady in the United Kingdoms.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [It is NAA1T AE1T AO1L what I LAY1K, HHIY1 continued.] contains unknown char/phoneme: [I].Original text: [It is not at all what I like, he continued.]. Symbol will be skipped.
[NeMo W 2023-01-31 10:06:18 tts_tokenizers:478] Text: [It is NAA1T AE1T AO1L what I LAY1K, HHIY1 continued.] contains unknown char/phoneme: [I].Original text: [It is not at all what I like, he continued.]. Symbol will be skipped.

But it seems that the original text is made up of just some ordinary words. Is it normal? What needs to be done about it? Thanks!

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (4 by maintainers)

Most upvoted comments

@XuesongYang Sorry, I missed this issue. I will remove all unnecessary @torch.jit.script decorator and notify here again. Jit script compiler is very strict on type annotation and the actual type. Will notify here after I do PR for this.