ForwardTacotron: Problem training for new language.
I am trying to train my model on a Marathi dataset. It is strange espeak doesnt seem to support it although mentioned in their Supported Languages Page.
In [3]: ph = to_phonemes("प्रदर्शनों के दौरान पुलिस की हिंसा और चुनावों में कथित धोखाधड़ी के ख़िलाफ़ बेलारूस में लोगों का गुस्सा बढ़ता ही जा रहा है")
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-3-944424e45249> in <module>
----> 1 ph = to_phonemes("प्रदर्शनों के दौरान पुलिस की हिंसा और चुनावों में कथित धोखाधड़ी के ख़िलाफ़ बेलारूस में लोगों का गुस्सा बढ़ता ही जा रहा है")
<ipython-input-2-a226f9833051> in to_phonemes(text)
9 njobs=1,
10 punctuation_marks=';:,.!?¡¿—…"«»“”()',
---> 11 language_switch='remove-flags')
12 phonemes = phonemes.replace('—', '-')
13 return phonemes
~/.virtualenvs/forwardtacoenv/lib/python3.6/site-packages/phonemizer/phonemize.py in phonemize(text, language, backend, separator, strip, preserve_punctuation, punctuation_marks, with_stress, language_switch, njobs, logger)
158 with_stress=with_stress,
159 language_switch=language_switch,
--> 160 logger=logger)
161 elif backend == 'espeak-mbrola':
162 phonemizer = backends[backend](
~/.virtualenvs/forwardtacoenv/lib/python3.6/site-packages/phonemizer/backend/espeak.py in __init__(self, language, punctuation_marks, preserve_punctuation, language_switch, with_stress, logger)
145 super().__init__(
146 language, punctuation_marks=punctuation_marks,
--> 147 preserve_punctuation=preserve_punctuation, logger=logger)
148 self.logger.debug('espeak is %s', self.espeak_path())
149
~/.virtualenvs/forwardtacoenv/lib/python3.6/site-packages/phonemizer/backend/base.py in __init__(self, language, punctuation_marks, preserve_punctuation, logger)
52 raise RuntimeError(
53 'language "{}" is not supported by the {} backend'
---> 54 .format(language, self.name()))
55 self.language = language
56
RuntimeError: language "mr" is not supported by the espeak backend
Now my only solution is to run it directly on grapheme. There is very less difference between the grapheme and phoneme for indic script. So I made the following changes.
cleaners.py
def to_phonemes(text):
# text = text.replace('-', '—')
# phonemes = phonemize(text,
# language=hp.language,
# backend='espeak',
# strip=True,
# preserve_punctuation=True,
# with_stress=False,
# njobs=1,
# punctuation_marks=';:,.!?¡¿—…"«»“”()',
# language_switch='remove-flags')
# phonemes = phonemes.replace('—', '-')
phonemes = text
return phonemes
symbols.py
""" from https://github.com/keithito/tacotron """
"""
Defines the set of symbols used in text input to the model.
The default is a set of ASCII characters that works well for English or text that has been run through Unidecode. For other data, you can modify _characters. See TRAINING_DATA.md for details. """
from utils.text import cmudict
_pad = "_"
_punctuation = "!'(),.:;? "
_special = "-"
# Prepend "@" to ARPAbet symbols to ensure uniqueness (some are the same as uppercase letters):
_arpabet = ["@" + s for s in cmudict.valid_symbols]
# Phonemes
# _vowels = 'iyɨʉɯuɪʏʊeøɘəɵɤoɛœɜɞʌɔæɐaɶɑɒᵻ'
# _non_pulmonic_consonants = 'ʘɓǀɗǃʄǂɠǁʛ'
# _pulmonic_consonants = 'pbtdʈɖcɟkɡqɢʔɴŋɲɳnɱmʙrʀⱱɾɽɸβfvθðszʃʒʂʐçʝxɣχʁħʕhɦɬɮʋɹɻjɰlɭʎʟ'
# _suprasegmentals = 'ˈˌːˑ'
# _other_symbols = 'ʍwɥʜʢʡɕʑɺɧ'
# _diacrilics = 'ɚ˞ɫ'
_phones = "ँंःअआइईउऊऋऌऍऎएऐऑऒओऔकखगघङचछजझञटठडढणतथदधनऩपफबभमयरऱलळऴवशषसह़ऽािीुूृॄॅॆेैॉॊोौ्ॐक़ख़ग़ज़ड़ढ़फ़य़ॠ॰ॲ"
phonemes = sorted(
list(
_pad
+ _punctuation
+ _special
+ _phones
# + _non_pulmonic_consonants
# + _pulmonic_consonants
# + _suprasegmentals
# + _other_symbols
# + _diacrilics
)
)
When I run python preprocess.py --path /home/ubuntu/datasets/Marathi_trim/
i get this error
/home/ubuntu/.virtualenvs/forwardtacoenv/lib/python3.6/site-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location.
Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0.
from numba.decorators import jit as optional_jit
35999 wav files found in "/home/ubuntu/datasets/Marathi_trim/"
+-------------+-----------+--------+------------+-----------+----------------+
| Sample Rate | Bit Depth | Mu Law | Hop Length | CPU Usage | Num Validation |
+-------------+-----------+--------+------------+-----------+----------------+
| 22050 | 9 | True | 256 | 103/104 | 200 |
+-------------+-----------+--------+------------+-----------+----------------+
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "preprocess.py", line 56, in process_wav
m, x = convert_file(path)
File "preprocess.py", line 42, in convert_file
peak = np.abs(y).max()
File "/home/ubuntu/.virtualenvs/forwardtacoenv/lib/python3.6/site-packages/numpy/core/_methods.py", line 39, in _amax
return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "preprocess.py", line 91, in <module>
for i, (item_id, length, cleaned_text) in enumerate(pool.imap_unordered(process_wav, wav_files), 1):
File "/usr/lib/python3.6/multiprocessing/pool.py", line 735, in next
raise value
ValueError: zero-size array to reduction operation maximum which has no identity
Am I making a mistake somewhere? Thanks in advance!
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 17
On a side note. If you need a better phonemizer, you could check out my new repo https://github.com/as-ideas/DeepPhonemizer. You will have to train your on phonemizer on an indian phoneme dataset. You could then use ForwarTacotron with ‘no_cleaners’ and ‘use_phonemes=False’.