transformers: [Bug] whisper pipeline inference bug on transformers master branch
System Info
OS: ubuntu 20.04
transformer version: master branch.
pip install git+https://github.com/huggingface/transformers
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
Run following code:
import transformers
from packaging.version import Version
import pathlib
def whisper_pipeline():
task = "automatic-speech-recognition"
architecture = "openai/whisper-tiny"
model = transformers.WhisperForConditionalGeneration.from_pretrained(architecture)
tokenizer = transformers.WhisperTokenizer.from_pretrained(architecture)
feature_extractor = transformers.WhisperFeatureExtractor.from_pretrained(architecture)
if Version(transformers.__version__) > Version("4.30.2"):
model.generation_config.alignment_heads = [[2, 2], [3, 0], [3, 2], [3, 3], [3, 4], [3, 5]]
return transformers.pipeline(
task=task, model=model, tokenizer=tokenizer, feature_extractor=feature_extractor
)
def raw_audio_file():
# The dataset file comes from https://github.com/mlflow/mlflow/blob/master/tests/datasets/apollo11_launch.wav
datasets_path = "/path/to/apollo11_launch.wav"
return pathlib.Path(datasets_path).read_bytes()
inference_config = {
"return_timestamps": "word",
"chunk_length_s": 60,
"batch_size": 16,
}
whisper = whisper_pipeline()
raw_audio_file_data = raw_audio_file()
prediction = whisper(raw_audio_file_data, return_timestamps="word", chunk_length_s=60, batch_size=16)
The last line raises error like:
>>> prediction = whisper(raw_audio_file_data, return_timestamps="word", chunk_length_s=60, batch_size=16)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 356, in __call__
return super().__call__(inputs, **kwargs)
File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1132, in __call__
return next(
File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 124, in __next__
item = next(self.iterator)
File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py", line 266, in __next__
processed = self.infer(next(self.iterator), **self.params)
File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1046, in forward
model_outputs = self._forward(model_inputs, **forward_params)
File "/home/weichen.xu/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 551, in _forward
generate_kwargs["num_frames"] = stride[0] // self.feature_extractor.hop_length
TypeError: unsupported operand type(s) for //: 'tuple' and 'int'
>>>
Note this error only happens on transformer github master branch. For released version, above code works well.
Expected behavior
My example code should not raise error.
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Reactions: 3
- Comments: 17 (4 by maintainers)
Good news: this was fixed by https://github.com/huggingface/transformers/pull/28114 🥳
also have the same issue, any update on this @sanchit-gandhi ?
@josebruzzoni
I’ve had the same issue. @WeichenXu123 's replies were very helpful, thanks man!
First try setting batch size to 1 if that’s not a problem.
Second, you can try going into the location that the error message says in the 3rd from last row. For me it says “”/home/nofreewill/.local/lib/python3.10/site-packages/transformers/pipelines/automatic_speech_recognition.py", line 552, in _forward" So I opened it, went to line 552 and changed according to @WeichenXu123 's suggestion: generate_kwargs[“num_frames”] = stride[0] // self.feature_extractor.hop_length generate_kwargs[“num_frames”] = stride[0][0] // self.feature_extractor.hop_length
And it works now with batch size > 1 as well
Yes It only happens when batch > 1