fairseq: Errors while replicating Example 3: ST on CoVoST

🐛 Bug

Hi, thank you for providing such a nice toolkit. I’m currently working on a research about speech translation, and tried to replicate your experiment. I followed the instructions here, but could not reproduce your results.

There are mainly 4 problems:

Bug 1: Some python packages are missing
Bug 2: Some CoVoST2 audio files are empty
Bug 3: Path in config file is incorrect
Bug 4: Tuple indices must be integers

Detailed error messages are at the bottom of this report.

To Reproduce

Steps to reproduce the behavior: (command line options are same as provided ones, except that I didn’t set --num-workers and --update-freq when executing fairseq-train)

Install fairseq following instruction on Github.
Run examples/speech_to_text/prep_covost_data.py.
You will see Bug 1.
Install pandas, torchaudio, and sentencepiece using pip.
Run examples/speech_to_text/prep_covost_data.py again.
You will see Bug 2 in some languages (at least I encountered this problem for En, De and Es).
Find and remove empty audio files from <covost root>/<lang>/raw/clips; also remove corresponding tsv rows from tsv files.
Run examples/speech_to_text/prep_covost_data.py again. It will finish successfully.
Run fairseq-train command:

fairseq-train ${COVOST_ROOT} --train-subset train_asr_<lang> ...

You will see Bug 3.
Copy or rename config file:

cp config_asr_<lang>.yaml config.yaml

Change audio_root in config.yaml line 1:

change audio_root: <path to covost root>/<lang> to audio_root: <path to covost root>

Run fairseq-train command:

fairseq-train ${COVOST_ROOT}/<lang> --train-subset train_asr_<lang> ...

You will see Bug 4.
Change fairseq/fairseq/models/transformer.py line 809 - 817 to:

    encoder_out[0]
    if (encoder_out is not None and len(encoder_out[0]) > 0)
    else None,
    encoder_out[1]
    if (
        encoder_out is not None
    )
    else None,

Run fairseq-train command again, training will start successfully.

Expected behavior

All commands finishes without any error.

Environment

fairseq version: Installed via following commands on Nov. 27, 2020 (commit dea66cc)

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

PyTorch version: 1.7.0
OS: Ubuntu 20.04
Python version: 3.8.5
CUDA/cuDNN version: 10.2/8.0.3
GPU models and configuration: GeForce GTX 1080 Ti x 8

Error messages

Error messages for Bug 1

<div>

Traceback (most recent call last):
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 16, in <module>
      import pandas as pd
ModuleNotFoundError: No module named 'pandas'

Traceback (most recent call last):
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 17, in <module>
    import torchaudio
ModuleNotFoundError: No module named 'torchaudio'

...
Traceback (most recent call last):
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 18, in <module>
    from examples.speech_to_text.data_utils import (
    File "<path to fairseq>/fairseq/examples/speech_to_text/data_utils.py", line 18, in <module>
      import sentencepiece as sp
ModuleNotFoundError: No module named 'sentencepiece'

</div>

Error messages for Bug 2

<div>

Fetching split train...
100%|██████████████████████████████████| 5.80G/5.80G [10:37<00:00, 9.76MB/s]


100%|██████████████████████████████████| 3.05M/3.05M [00:00<00:00, 3.66MB/s]
Extracting log mel filter bank features...
  0%|                                  | 0/79015 [00:00<?, ?it/s]<path to venv>/venv/lib/python3.8/site-packages/torchaudio/compliance/kaldi.py:574: UserWarning: The function torch.rfft is deprecated and will be removed in a future PyTorch release. Use the new torch.fft module functions, instead, by importing torch.fft and calling torch.fft.fft or torch.fft.rfft. (Triggered internally at  /pytorch/aten/src/ATen/native/SpectralOps.cpp:590.)
    fft = torch.rfft(strided_input, 1, normalized=False, onesided=True)
 37%|████████████                      | 29137/79015 [1:21:38<1:52:09,  7.41it/s]
 formats: can't open input file `<path to covost root>/es/raw/clips/common_voice_es_19499893.mp3':
 37%|███████████                       | 29138/79015 [1:21:38<2:19:45,  5.95it/s]
  Traceback (most recent call last):
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 294, in <module>
    main()
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 290, in main
      process(args)
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 222, in process
    for waveform, sample_rate, _, _, _, utt_id in tqdm(dataset):
    File "<path to venv>/venv/lib/python3.8/site-packages/tqdm/std.py", line 1193, in __iter__
      for obj in iterable:
    File "fairseq/examples/speech_to_text/prep_covost_data.py", line 201, in __getitem__
    waveform, sample_rate = torchaudio.load(path)
    File "<path to venv>/venv/lib/python3.8/site-packages/torchaudio/backend/sox_backend.py", line 48, in load
      sample_rate = _torchaudio.read_audio_file(
  RuntimeError: Error opening audio file

</div>

Error messages for Bug 3

<div>

  2020-11-30 14:36:44 | INFO | fairseq.data.audio.speech_to_text_dataset | Cannot find <path to covost root>/config.yaml
  Traceback (most recent call last):
    File "<path to venv>/venv/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
    File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 392, in cli_main
      distributed_utils.call_main(cfg, main)
    File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 313, in call_main
    torch.multiprocessing.spawn(
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
      return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
      raise Exception(msg)
  Exception:

  -- Process 3 terminated with the following error:
  Traceback (most recent call last):
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
    File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 300, in distributed_main
      main(cfg, **kwargs)
    File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 66, in main
    task = tasks.setup_task(cfg.task)
    File "<path to fairseq>/fairseq/fairseq/tasks/__init__.py", line 44, in setup_task
      return task.setup_task(cfg, **kwargs)
    File "<path to fairseq>/fairseq/fairseq/tasks/speech_to_text.py", line 58, in setup_task
    raise FileNotFoundError(f"Dict not found: {dict_path}")
FileNotFoundError: Dict not found: <path to covost root>/dict.txt

</div>

Error messages for Bug 4

<div>

...
  warnings.warn(
  epoch 001:   0%|                                                                         | 0/2 [00:00<?, ?it/s]2020-11-30 14:40:54 | WARNING | fairseq.logging.progress_bar | tensorboard not found, please install with: pip install tensorboardX
  2020-11-30 14:40:54 | INFO | fairseq.trainer | begin training epoch 1
  Traceback (most recent call last):
    File "<path to venv>/venv/bin/fairseq-train", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
    File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 392, in cli_main
      distributed_utils.call_main(cfg, main)
    File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 313, in call_main
    torch.multiprocessing.spawn(
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
      return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
      raise Exception(msg)
  Exception:

  -- Process 7 terminated with the following error:
  Traceback (most recent call last):
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
    File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 300, in distributed_main
      main(cfg, **kwargs)
    File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 130, in main
    valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
    File "/usr/lib/python3.8/contextlib.py", line 75, in inner
      return func(*args, **kwds)
    File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 219, in train
    log_output = trainer.train_step(samples)
    File "/usr/lib/python3.8/contextlib.py", line 75, in inner
      return func(*args, **kwds)
    File "<path to fairseq>/fairseq/fairseq/trainer.py", line 540, in train_step
    loss, sample_size_i, logging_output = self.task.train_step(
    File "<path to fairseq>/fairseq/fairseq/tasks/fairseq_task.py", line 428, in train_step
      loss, sample_size, logging_output = criterion(model, sample)
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "<path to fairseq>/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 69, in forward
      net_output = model(**sample["net_input"])
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
      output = self.module(*inputs[0], **kwargs[0])
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "<path to fairseq>/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 259, in forward
      decoder_out = self.decoder(
    File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
    File "<path to fairseq>/fairseq/fairseq/models/transformer.py", line 693, in forward
      x, extra = self.extract_features(
    File "<path to fairseq>/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 381, in extract_features
    x, _ = self.extract_features_scriptable(
    File "<path to fairseq>/fairseq/fairseq/models/transformer.py", line 810, in extract_features_scriptable
      if (encoder_out is not None and len(encoder_out["encoder_out"]) > 0)
  TypeError: tuple indices must be integers or slices, not str

  /usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 8 leaked semaphore objects to clean up at shutdown
    warnings.warn('resource_tracker: There appear to be %d '

</div>

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 1
Comments: 22 (11 by maintainers)

Most upvoted comments

@ZLKong did you make any changes to fairseq/fairseq/models/transformer.py (it seems to be different from the version in the master branch)?

Hi @kahne, I don’t think I changed anything, please check this issue that I posted. #2983 (comment) The transformer.py code that I used is this https://github.com/pytorch/fairseq/blob/0db28cdd0e50cad9c36e5e47ffceff40beaf6f60/fairseq/models/transformer.py#L807-L810

I see. encoder_out was a NamedTuple but recently reverted back to a dictionary. However, the s2t_transformer still uses NamedTuple as the return type: https://github.com/pytorch/fairseq/blob/0db28cdd0e50cad9c36e5e47ffceff40beaf6f60/fairseq/models/speech_to_text/s2t_transformer.py#L317

. It needs to be updated to dict as well.

@kahne Thank you very much. How can I change return EncoderOut( to dict?
return {
           "encoder_out": new_encoder_out,  # T x B x C
           "encoder_padding_mask": new_encoder_padding_mask,  # B x T
           "encoder_embedding": new_encoder_embedding,  # B x T x C
           "encoder_states": encoder_states,  # List[T x B x C]
           "src_tokens": None,
           "src_lengths": None,
        }
@kahne Thanks!
@ZLKong did you make any changes to fairseq/fairseq/models/transformer.py (it seems to be different from the version in the master branch)?

Hi @kahne, I don’t think I changed anything, please check this issue that I posted. #2983 (comment) The transformer.py code that I used is this https://github.com/pytorch/fairseq/blob/0db28cdd0e50cad9c36e5e47ffceff40beaf6f60/fairseq/models/transformer.py#L807-L810

I see. encoder_out was a NamedTuple but recently reverted back to a dictionary. However, the s2t_transformer still uses NamedTuple as the return type: https://github.com/pytorch/fairseq/blob/0db28cdd0e50cad9c36e5e47ffceff40beaf6f60/fairseq/models/speech_to_text/s2t_transformer.py#L317

. It needs to be updated to dict as well.

@kahne Thank you very much. How can I change return EncoderOut( to dict?
return {
           "encoder_out": new_encoder_out,  # T x B x C
           "encoder_padding_mask": new_encoder_padding_mask,  # B x T
           "encoder_embedding": new_encoder_embedding,  # B x T x C
           "encoder_states": encoder_states,  # List[T x B x C]
           "src_tokens": None,
           "src_lengths": None,
        }
Hi, I had the same issue and I’ve replaced the code with the one above but I get this error: epoch 001: 0%| | 0/562 [00:00<?, ?it/s]2020-12-16 11:06:29 | INFO | fairseq.trainer | begin training epoch 1 Traceback (most recent call last): File "/home/sarapapi/anaconda3/envs/nlp_venv/bin/fairseq-train", line 33, in <module> sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')()) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq_cli/train.py", line 413, in cli_main distributed_utils.call_main(cfg, main) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/distributed_utils.py", line 336, in call_main main(cfg, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq_cli/train.py", line 138, in main valid_losses, should_stop = train(cfg, trainer, task, epoch_itr) File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/contextlib.py", line 75, in inner return func(*args, **kwds) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq_cli/train.py", line 235, in train log_output = trainer.train_step(samples) File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/contextlib.py", line 75, in inner return func(*args, **kwds) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/trainer.py", line 530, in train_step loss, sample_size_i, logging_output = self.task.train_step( File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/tasks/fairseq_task.py", line 430, in train_step loss, sample_size, logging_output = criterion(model, sample) File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 69, in forward net_output = model(**sample["net_input"]) File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 259, in forward decoder_out = self.decoder( File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/models/transformer.py", line 706, in forward x, extra = self.extract_features( File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 381, in extract_features x, _ = self.extract_features_scriptable( File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/models/transformer.py", line 820, in extract_features_scriptable x, layer_attn, _ = layer( File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/modules/transformer_layer.py", line 373, in forward x, attn = self.encoder_attn( File "/home/sarapapi/anaconda3/envs/nlp_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/mnt/c/Users/sarap/OneDrive/Desktop/Dottorato/End2End/fairseq/fairseq/modules/multihead_attention.py", line 303, in forward assert key_padding_mask.size(0) == bsz AssertionError Is it somewhat related? Thanks
@sarapapi Can you try again with the latest master branch? I will close this issue for now. Please feel free to open a new one if this is still not resolved.

All solved, thank you!

sarapapi on Jan 7, 2021

The training itself was successful, but I could not reproduce the original scores. For example, my WER for En ASR is 32.29, not 25.6. Do you have any idea why this is? Here are my scripts: Training command Evaluation commands config.yaml

@cromz22 Thanks for reporting the issue. This is caused by a bug in the latest Hydra configuration system — the arguments --wer-tokenizer 13a --wer-lowercase --wer-remove-punct are not passed into the WER scorer properly (higher WER then without punctuation removal and lowercasing). I will make a fix shortly and let you know.

Please pull the latest master branch for the bug fix.

kahne on Jan 6, 2021

It looks like you have use:

      return {
           "encoder_out": [x],  # T x B x C
           "encoder_padding_mask": [encoder_padding_mask],  # B x T
           "encoder_embedding": [],  # B x T x C
           "encoder_states": [],  # List[T x B x C]
           "src_tokens": [],
           "src_lengths": [],
        }

i.e. the returned tensors should be wrapped in lists.

bhaddow on Dec 18, 2020

I replaced them and got the following:

  0%|                                                                                 | 0/177 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/sshimizu/orange/speechtrans/fairseqS2T/venv/bin/fairseq-generate", line 33, in <module>
    sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-generate')())
  File "/mnt/orange/sshimizu/speechtrans/fairseqS2T/fairseq/fairseq_cli/generate.py", line 394, in cli_main
    main(args)
  File "/mnt/orange/sshimizu/speechtrans/fairseqS2T/fairseq/fairseq_cli/generate.py", line 50, in main
    return _main(cfg, sys.stdout)
  File "/mnt/orange/sshimizu/speechtrans/fairseqS2T/fairseq/fairseq_cli/generate.py", line 175, in _main
    tokenizer = task.build_tokenizer(args)
  NameError: name 'args' is not defined

args is named as cfg in main function, so I replaced them, and the generation was successful. Thank you!

cromz22 on Dec 15, 2020

@cromz22 thanks for reporting the above issues! I will update the code shortly.

kahne on Dec 15, 2020