fairseq: [Wav2Vec2] Cannot load newly added Wav2Vec2 checkpoints

πŸ› Bug

A recent commit: https://github.com/pytorch/fairseq/commit/2513524a1604dbafcc4ea9cc5a99ae0aa4f19694 added two new fine-tuned Wav2Vec2 checkpoints, however it seems like there is a problem with the saved config as one cannot load those checkpoints. E.g. the following code cannot be run:

import fairseq
model, _, _ = fairseq.checkpoint_utils.load_model_ensemble_and_task([checkpoint_path], arg_overrides={"data": "path/to/dict"})

To Reproduce

The following colab reproduces the error (one just has to run all cells): https://colab.research.google.com/drive/13hJI4w8pOD33hxOJ_qwKkN9QqdKVH5IM?usp=sharing

Kindly pinging @alexeib here πŸ˜ƒ

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 15 (4 by maintainers)

Most upvoted comments

@patrickvonplaten Hi, I met the same problem. Do you have any solution? Thank you. I run the code:

model, _, _ = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp_path])

I got the error:

ConfigKeyError: Key 'target_dict' not in 'AudioPretrainingConfig' full_key: target_dict reference_type=Optional[AudioPretrainingConfig] object_type=AudioPretrainingConfig

I am still getting this error

ConfigKeyError: Key 'eval_wer' not in 'AudioPretrainingConfig'
	full_key: eval_wer
	reference_type=Optional[AudioPretrainingConfig]
	object_type=AudioPretrainingConfig

while running the code

import torch
import fairseq
cp_path = '../w2v_large_lv_fsh_swbd_cv_ftsb300_updated.pt'
model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([cp_path])
model = model[0]
model.eval()
Package                Version         Location
---------------------- --------------- --------------------------------------------
antlr4-python3-runtime 4.8
backcall               0.2.0
bitarray               2.3.7
certifi                2021.10.8
cffi                   1.15.0
colorama               0.4.4
Cython                 0.29.28
debugpy                1.5.1
decorator              5.1.1
entrypoints            0.3
fairseq                1.0.0a0+5175fd5 
hydra-core             1.0.7
ipykernel              6.4.1
ipython                7.31.1
ipython-genutils       0.2.0
jedi                   0.18.1
jupyter-client         7.1.2
jupyter-core           4.9.1
matplotlib-inline      0.1.2
nest-asyncio           1.5.1
numpy                  1.22.2
omegaconf              2.0.6
parso                  0.8.3
pexpect                4.8.0
pickleshare            0.7.5
pip                    21.2.4
portalocker            2.4.0
prompt-toolkit         3.0.20
protobuf               3.19.4
ptyprocess             0.7.0
pycparser              2.21
Pygments               2.11.2
python-dateutil        2.8.2
PyYAML                 6.0
pyzmq                  22.3.0
regex                  2022.1.18
sacrebleu              2.0.0
setuptools             58.0.4
six                    1.16.0
tabulate               0.8.9
tensorboardX           2.5
torch                  1.10.2
torchaudio             0.10.2
tornado                6.1
tqdm                   4.62.3
traitlets              5.1.1
typing_extensions      4.1.1
wcwidth                0.2.5
wheel                  0.37.1

same problem, Have you solved it?

You can solve this by cloning the repo, and then just copying all those missing parameters from the audio fine-tuning config into the audio pretraining config

We have the robust models otherwise also on the HF Hub: https://huggingface.co/models?arxiv=arxiv:2104.01027 if you’re interested