DeepSpeech: Problem with SWC corpus script

Have I written custom code (as opposed to running examples on an unmodified clone of the repository): No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow installed from (our builds, or upstream TensorFlow): Yes
TensorFlow version (use command below): b’v1.13.1-0-g6612da8951’ 1.13.1
Python version: 3.5
Bazel version (if compiling from source): 0.19.2
GCC/Compiler version (if compiling from source): 5.4.0
CUDA/cuDNN version: 10.0.130
GPU model and memory: Quadro RTX 6000, 72GB

Hello Team,

I am trying to use import_swc.py (under bin) to preprocess SWC corpus. I used the following command:

DeepSpeech/bin/import_swc.py . --language german --normalize --german_alphabet ../../../dependencies/alphabet.txt

But when I train the DeepSpeech model, the training loss is always infinite. Please guide how to resolve this issue. Below are the logs:

WARNING:tensorflow:From /home/LTLab.lan/agarwal/python-environments/env/lib/python3.5/site-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.

WARNING:tensorflow:From /home/LTLab.lan/agarwal/python-environments/env/lib/python3.5/site-packages/tensorflow/python/data/ops/iterator_ops.py:358: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/LTLab.lan/agarwal/python-environments/env/lib/python3.5/site-packages/tensorflow/contrib/rnn/python/ops/lstm_ops.py:696: to_int64 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
I Initializing variables...
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:18:08 | Steps: 1845 | Loss: inf
Epoch 0 | Validation | Elapsed Time: 0:00:36 | Steps: 139 | Loss: 270.188871 | Dataset: ../german-speech-corpus/delete/swc/dev_swc.csv
I Saved new best validating model with loss 270.188871 to: /home/LTLab.lan/agarwal/.local/share/deepspeech/checkpoints/best_dev-1845
Epoch 1 |   Training | Elapsed Time: 0:17:52 | Steps: 1845 | Loss: inf
Epoch 1 | Validation | Elapsed Time: 0:00:35 | Steps: 139 | Loss: 227.384010 | Dataset: ../german-speech-corpus/delete/swc/dev_swc.csv
WARNING:tensorflow:From /home/LTLab.lan/agarwal/python-environments/env/lib/python3.5/site-packages/tensorflow/python/training/saver.py:966: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.
I Saved new best validating model with loss 227.384010 to: /home/LTLab.lan/agarwal/.local/share/deepspeech/checkpoints/best_dev-3690
Epoch 2 |   Training | Elapsed Time: 0:17:52 | Steps: 1845 | Loss: inf
Epoch 2 | Validation | Elapsed Time: 0:00:35 | Steps: 139 | Loss: 218.371178 | Dataset: ../german-speech-corpus/delete/swc/dev_swc.csv
I Saved new best validating model with loss 218.371178 to: /home/LTLab.lan/agarwal/.local/share/deepspeech/checkpoints/best_dev-5535
Epoch 3 |   Training | Elapsed Time: 0:17:53 | Steps: 1845 | Loss: inf
Epoch 3 | Validation | Elapsed Time: 0:00:35 | Steps: 139 | Loss: 322.072106 | Dataset: ../german-speech-corpus/delete/swc/dev_swc.csv
WARNING:tensorflow:From /home/LTLab.lan/agarwal/python-environments/env/lib/python3.5/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
I Early stop triggered as (for last 4 steps) validation loss: 322.072106 with standard deviation: 22.604229 and mean: 238.648019
I FINISHED optimization in 1:14:16.207693
I Restored variables from best validation checkpoint at /home/LTLab.lan/agarwal/.local/share/deepspeech/checkpoints/best_dev-5535, step 5535
Testing model on ../german-speech-corpus/delete/swc/test_swc.csv
Test epoch | Steps: 412 | Elapsed Time: 0:08:00
WARNING:tensorflow:From /home/LTLab.lan/agarwal/python-environments/env/lib/python3.5/site-packages/tensorflow/python/tools/freeze_graph.py:232: convert_variables_to_constants (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.convert_variables_to_constants
WARNING:tensorflow:From /home/LTLab.lan/agarwal/python-environments/env/lib/python3.5/site-packages/tensorflow/python/framework/graph_util_impl.py:245: extract_sub_graph (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.compat.v1.graph_util.extract_sub_graph
Test on ../german-speech-corpus/delete/swc/test_swc.csv - WER: 0.984189, CER: 0.952155, loss: 221.439163
--------------------------------------------------------------------------------
WER: 3.000000, CER: 1.833333, loss: 90.893661
 - src: "wurden"
 - res: "in den hundert"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 0.789474, loss: 41.020634
 - src: "umweltveränderungen"
 - res: "um ein"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.200000, loss: 77.087273
 - src: "array"
 - res: "er ende"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 2.000000, loss: 86.086899
 - src: "sex"
 - res: "in den "
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.100000, loss: 120.730904
 - src: "siebzehnte"
 - res: "es unendlich"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 1.250000, loss: 157.400894
 - src: "monotherapie"
 - res: "die eeeeeeeeeeeee"
--------------------------------------------------------------------------------
WER: 2.000000, CER: 4.250000, loss: 191.515320
 - src: "doch"
 - res: "es hunderttausende"
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 2.211713
 - src: "an"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 2.343423
 - src: "mit"
 - res: ""
--------------------------------------------------------------------------------
WER: 1.000000, CER: 1.000000, loss: 2.612154
 - src: "auf"
 - res: ""
--------------------------------------------------------------------------------
I Exporting the model...
I Models exported at ../models

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 20 (13 by maintainers)

Most upvoted comments

@AASHISHAG Regarding 1: I’ll take some of them for the filter rules - thanks! Regarding 2: Looks like the vocabulary.

tilmankamp on Jan 10, 2020

If you imported TUDA, you should find the README under <import-dir>/german-speechdata-package-v2/README. The containing archive’s URL is constructed like this: https://github.com/mozilla/DeepSpeech/blob/85a61a3ab74aa28a08723236ddab740c7a9fa1e3/bin/import_tuda.py#L27-L29 Result: http://ltdata1.informatik.uni-hamburg.de/kaldi_tuda_de/german-speechdata-package-v2.tar.gz

tilmankamp on Jan 2, 2020

@AASHISHAG

I noticed that in SWC script you have used a “speaker” flag to identify the speakers and I assume that you are possibly splitting the overall data set in SWC into training, development and test partitions in such a way that speakers or sentences do not overlap across the different sets. Please confirm?

Confirmed (for the speakers).

#2625 is for adding article name and the speaker to CSV columns for debugging - This will let you verify that each speaker is restricted to one set. It also allows excluding “unknown” speakers (in case an unknown speaker is actually just an unidentified existing one). Be aware: There is no “sentence overlap” check, as the importer assumes Wikipedia articles not sharing equal sentences.

tilmankamp on Dec 31, 2019