transformers: [Bug] whisper pipeline inference bug on transformers master branch

System Info

OS: ubuntu 20.04

transformer version: master branch. pip install git+https://github.com/huggingface/transformers

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Run following code:

import transformers
from packaging.version import Version
import pathlib


def whisper_pipeline():
    task = "automatic-speech-recognition"
    architecture = "openai/whisper-tiny"
    model = transformers.WhisperForConditionalGeneration.from_pretrained(architecture)
    tokenizer = transformers.WhisperTokenizer.from_pretrained(architecture)
    feature_extractor = transformers.WhisperFeatureExtractor.from_pretrained(architecture)
    if Version(transformers.__version__) > Version("4.30.2"):
        model.generation_config.alignment_heads = [[2, 2], [3, 0], [3, 2], [3, 3], [3, 4], [3, 5]]
    return transformers.pipeline(
        task=task, model=model, tokenizer=tokenizer, feature_extractor=feature_extractor
    )

def raw_audio_file():
    # The dataset file comes from https://github.com/mlflow/mlflow/blob/master/tests/datasets/apollo11_launch.wav
    datasets_path = "/path/to/apollo11_launch.wav"
    return pathlib.Path(datasets_path).read_bytes()


inference_config = {
    "return_timestamps": "word",
    "chunk_length_s": 60,
    "batch_size": 16,
}
whisper = whisper_pipeline()
raw_audio_file_data = raw_audio_file()
prediction = whisper(raw_audio_file_data, return_timestamps="word", chunk_length_s=60, batch_size=16)

The last line raises error like:

>>> prediction = whisper(raw_audio_file_data, return_timestamps="word", chunk_length_s=60, batch_size=16)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 356, in __call__
    return super().__call__(inputs, **kwargs)
  File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1132, in __call__
    return next(
  File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
    item = next(self.iterator)
  File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 266, in __next__
    processed = self.infer(next(self.iterator), **self.params)
  File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1046, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 551, in _forward
    generate_kwargs["num_frames"] = stride[0] // self.feature_extractor.hop_length
TypeError: unsupported operand type(s) for //: 'tuple' and 'int'
>>>

Note this error only happens on transformer github master branch. For released version, above code works well.

Expected behavior

My example code should not raise error.

About this issue

Original URL
State: closed
Created 9 months ago
Reactions: 3
Comments: 17 (4 by maintainers)

Most upvoted comments

Good news: this was fixed by https://github.com/huggingface/transformers/pull/28114 🥳

xenova on Dec 23, 2023

also have the same issue, any update on this @sanchit-gandhi ?

timnlupo on Nov 10, 2023

@josebruzzoni

I’ve had the same issue. @WeichenXu123 's replies were very helpful, thanks man!

First try setting batch size to 1 if that’s not a problem.

Second, you can try going into the location that the error message says in the 3rd from last row. For me it says “”/home/nofreewill/.local/lib/python3.10/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 552, in _forward" So I opened it, went to line 552 and changed according to @WeichenXu123 's suggestion: generate_kwargs[“num_frames”] = stride[0] // self.feature_extractor.hop_length generate_kwargs[“num_frames”] = stride[0][0] // self.feature_extractor.hop_length

And it works now with batch size > 1 as well

nofreewill42 on Nov 9, 2023

Thanks for the ping. My hunch is that this is due to batch_size being larger than 1. Just to confirm, does the same thing happen if you remove that argument?

Yes It only happens when batch > 1

WeichenXu123 on Sep 27, 2023