transformers: ValueError: cannot find context for 'fork' when processor_with_lm.batch_decode(_logits)

System Info

## Environment info
- `transformers` version: 4.17.0
- Platform: Windows-10-10.0.22000-SP0
- Python version: 3.8.13
- PyTorch version (GPU?): 1.9.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No

Who can help?

@patrickvonplaten

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

To reproduce

  • The model I am using (Wav2Vec2.0 Large XLS-R 53 English):

  • Steps to reproduce the behavior:

  1. I am fine-tuning Wav2Vec with LM Head using WikiText to produce 5-grams LM. I downloaded the fine-tuned model dir locally and was able to perform inference on my audio .wav file(s)
  2. Please find here, model files, test audio file, and requirements.txt if needed to reproduce the problem

Code snippet

import torch
from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
from datasets import load_dataset
import soundfile as sf
  

model_name = "jonatasgrosman/wav2vec2-large-xlsr-53-english"
model = Wav2Vec2ForCTC.from_pretrained(model_name)
processor_path = path_join(getcwd(), "stt_assets", "stt_model")
processor = Wav2Vec2ProcessorWithLM.from_pretrained(processor_path)
  
dataset = load_dataset("timit_asr", split="test").shuffle().shuffle().select(range(100))
char_translations = str.maketrans({"-": " ", ",": "", ".": "", "?": ""})


def prepare_example(example):
    example["speech"], _ = sf.read(example["file"])
    example["text"] = example["text"].translate(char_translations)
    example["text"] = " ".join(example["text"].split())  # clean up whitespace
    example["text"] = example["text"].lower()
    return example
  

dataset = dataset.map(prepare_example, remove_columns=["file"])
  
pprint(dataset)
features = processor(speech, sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():
    logits = model(**features).logits

# logits shape is torch.Size([100, 304, 33])
transcription = processor.batch_decode(logits)
# EXCEPTION IS RAISED in `processor.batch_decode()` ValueError: cannot find context for 'fork'
print(transcription)

Expected behavior

What I am expecting is that I get a list of transcriptions from `processor.batch_decode()` 

but I get this `ValueError: cannot find context for 'fork'` Exception. I am using Windows 11, 

I have tried to research it and I guess it is something related to multiprocessing but I could 
not really figure out how to solve it yet

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

I think we can actually just change "fork" to "spawn" (no need for a try, … expect IMO). According to https://stackoverflow.com/questions/64095876/multiprocessing-fork-vs-spawn and some other docs, "spawn" is safe and given that the child process is LM-boosted decoding (which is always slow), doing the switch should be fine

Okay let us do it your way then, I have also created a custom dataset loader (from flac/wav audio files) and model finetuner, evaluator if those can be helpful for the community I would love to share them as well

For now I will open a PR for spawn and fork