fairseq: Errors while replicating Example 3: ST on CoVoST
π Bug
Hi, thank you for providing such a nice toolkit. Iβm currently working on a research about speech translation, and tried to replicate your experiment. I followed the instructions here, but could not reproduce your results.
There are mainly 4 problems:
- Bug 1: Some python packages are missing
- Bug 2: Some CoVoST2 audio files are empty
- Bug 3: Path in config file is incorrect
- Bug 4: Tuple indices must be integers
Detailed error messages are at the bottom of this report.
To Reproduce
Steps to reproduce the behavior:
(command line options are same as provided ones, except that I didnβt set --num-workers
and --update-freq
when executing fairseq-train
)
- Install fairseq following instruction on Github.
- Run
examples/speech_to_text/prep_covost_data.py
. - You will see Bug 1.
- Install pandas, torchaudio, and sentencepiece using pip.
- Run
examples/speech_to_text/prep_covost_data.py
again. - You will see Bug 2 in some languages (at least I encountered this problem for En, De and Es).
- Find and remove empty audio files from
<covost root>/<lang>/raw/clips
; also remove corresponding tsv rows from tsv files. - Run
examples/speech_to_text/prep_covost_data.py
again. It will finish successfully. - Run
fairseq-train
command:
fairseq-train ${COVOST_ROOT} --train-subset train_asr_<lang> ...
- You will see Bug 3.
- Copy or rename config file:
cp config_asr_<lang>.yaml config.yaml
- Change audio_root in config.yaml line 1:
change
audio_root: <path to covost root>/<lang>
to
audio_root: <path to covost root>
- Run
fairseq-train
command:
fairseq-train ${COVOST_ROOT}/<lang> --train-subset train_asr_<lang> ...
- You will see Bug 4.
- Change
fairseq/fairseq/models/transformer.py
line 809 - 817 to:
encoder_out[0]
if (encoder_out is not None and len(encoder_out[0]) > 0)
else None,
encoder_out[1]
if (
encoder_out is not None
)
else None,
- Run
fairseq-train
command again, training will start successfully.
Expected behavior
All commands finishes without any error.
Environment
- fairseq version: Installed via following commands on Nov. 27, 2020 (commit dea66cc)
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
- PyTorch version: 1.7.0
- OS: Ubuntu 20.04
- Python version: 3.8.5
- CUDA/cuDNN version: 10.2/8.0.3
- GPU models and configuration: GeForce GTX 1080 Ti x 8
Error messages
Error messages for Bug 1
<div>Traceback (most recent call last):
File "fairseq/examples/speech_to_text/prep_covost_data.py", line 16, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'
Traceback (most recent call last):
File "fairseq/examples/speech_to_text/prep_covost_data.py", line 17, in <module>
import torchaudio
ModuleNotFoundError: No module named 'torchaudio'
...
Traceback (most recent call last):
File "fairseq/examples/speech_to_text/prep_covost_data.py", line 18, in <module>
from examples.speech_to_text.data_utils import (
File "<path to fairseq>/fairseq/examples/speech_to_text/data_utils.py", line 18, in <module>
import sentencepiece as sp
ModuleNotFoundError: No module named 'sentencepiece'
</div>
Error messages for Bug 2
<div>Fetching split train...
100%|ββββββββββββββββββββββββββββββββββ| 5.80G/5.80G [10:37<00:00, 9.76MB/s]
100%|ββββββββββββββββββββββββββββββββββ| 3.05M/3.05M [00:00<00:00, 3.66MB/s]
Extracting log mel filter bank features...
0%| | 0/79015 [00:00<?, ?it/s]<path to venv>/venv/lib/python3.8/site-packages/torchaudio/compliance/kaldi.py:574: UserWarning: The function torch.rfft is deprecated and will be removed in a future PyTorch release. Use the new torch.fft module functions, instead, by importing torch.fft and calling torch.fft.fft or torch.fft.rfft. (Triggered internally at /pytorch/aten/src/ATen/native/SpectralOps.cpp:590.)
fft = torch.rfft(strided_input, 1, normalized=False, onesided=True)
37%|ββββββββββββ | 29137/79015 [1:21:38<1:52:09, 7.41it/s]
formats: can't open input file `<path to covost root>/es/raw/clips/common_voice_es_19499893.mp3':
37%|βββββββββββ | 29138/79015 [1:21:38<2:19:45, 5.95it/s]
Traceback (most recent call last):
File "fairseq/examples/speech_to_text/prep_covost_data.py", line 294, in <module>
main()
File "fairseq/examples/speech_to_text/prep_covost_data.py", line 290, in main
process(args)
File "fairseq/examples/speech_to_text/prep_covost_data.py", line 222, in process
for waveform, sample_rate, _, _, _, utt_id in tqdm(dataset):
File "<path to venv>/venv/lib/python3.8/site-packages/tqdm/std.py", line 1193, in __iter__
for obj in iterable:
File "fairseq/examples/speech_to_text/prep_covost_data.py", line 201, in __getitem__
waveform, sample_rate = torchaudio.load(path)
File "<path to venv>/venv/lib/python3.8/site-packages/torchaudio/backend/sox_backend.py", line 48, in load
sample_rate = _torchaudio.read_audio_file(
RuntimeError: Error opening audio file
</div>
Error messages for Bug 3
<div> 2020-11-30 14:36:44 | INFO | fairseq.data.audio.speech_to_text_dataset | Cannot find <path to covost root>/config.yaml
Traceback (most recent call last):
File "<path to venv>/venv/bin/fairseq-train", line 33, in <module>
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 392, in cli_main
distributed_utils.call_main(cfg, main)
File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 313, in call_main
torch.multiprocessing.spawn(
File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 3 terminated with the following error:
Traceback (most recent call last):
File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 300, in distributed_main
main(cfg, **kwargs)
File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 66, in main
task = tasks.setup_task(cfg.task)
File "<path to fairseq>/fairseq/fairseq/tasks/__init__.py", line 44, in setup_task
return task.setup_task(cfg, **kwargs)
File "<path to fairseq>/fairseq/fairseq/tasks/speech_to_text.py", line 58, in setup_task
raise FileNotFoundError(f"Dict not found: {dict_path}")
FileNotFoundError: Dict not found: <path to covost root>/dict.txt
</div>
Error messages for Bug 4
<div>...
warnings.warn(
epoch 001: 0%| | 0/2 [00:00<?, ?it/s]2020-11-30 14:40:54 | WARNING | fairseq.logging.progress_bar | tensorboard not found, please install with: pip install tensorboardX
2020-11-30 14:40:54 | INFO | fairseq.trainer | begin training epoch 1
Traceback (most recent call last):
File "<path to venv>/venv/bin/fairseq-train", line 33, in <module>
sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-train')())
File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 392, in cli_main
distributed_utils.call_main(cfg, main)
File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 313, in call_main
torch.multiprocessing.spawn(
File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 7 terminated with the following error:
Traceback (most recent call last):
File "<path to venv>/venv/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "<path to fairseq>/fairseq/fairseq/distributed_utils.py", line 300, in distributed_main
main(cfg, **kwargs)
File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 130, in main
valid_losses, should_stop = train(cfg, trainer, task, epoch_itr)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "<path to fairseq>/fairseq/fairseq_cli/train.py", line 219, in train
log_output = trainer.train_step(samples)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "<path to fairseq>/fairseq/fairseq/trainer.py", line 540, in train_step
loss, sample_size_i, logging_output = self.task.train_step(
File "<path to fairseq>/fairseq/fairseq/tasks/fairseq_task.py", line 428, in train_step
loss, sample_size, logging_output = criterion(model, sample)
File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "<path to fairseq>/fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 69, in forward
net_output = model(**sample["net_input"])
File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "<path to fairseq>/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 259, in forward
decoder_out = self.decoder(
File "<path to venv>/venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "<path to fairseq>/fairseq/fairseq/models/transformer.py", line 693, in forward
x, extra = self.extract_features(
File "<path to fairseq>/fairseq/fairseq/models/speech_to_text/s2t_transformer.py", line 381, in extract_features
x, _ = self.extract_features_scriptable(
File "<path to fairseq>/fairseq/fairseq/models/transformer.py", line 810, in extract_features_scriptable
if (encoder_out is not None and len(encoder_out["encoder_out"]) > 0)
TypeError: tuple indices must be integers or slices, not str
/usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 8 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
</div>
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 22 (11 by maintainers)
All solved, thank you!
Please pull the latest master branch for the bug fix.
It looks like you have use:
i.e. the returned tensors should be wrapped in lists.
I replaced them and got the following:
args
is named ascfg
inmain
function, so I replaced them, and the generation was successful. Thank you!@cromz22 thanks for reporting the above issues! I will update the code shortly.