transformers: audio pipeline utility ffmpeg_microphone_live doesn't work in Google Colab

System Info

Googel Colab 2023/07/21 / Chrome 115.0.5790.114 / macOS 13.5 Default MacBook microphone

Who can help?

@sanchit-gandhi

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

make sure microphone works and access to microphone is enabled in Colab
follow https://huggingface.co/learn/audio-course/chapter7/voice-assistant

Expected behavior

Colab cell launch_fn(debug=True) should output scores while listening for the wake word as described in the tutorial. Instead, nothing happens. No error is shown and no audio is recorded. This makes the transcribe cell in the tutorial crash.

About this issue

Original URL
State: closed
Created a year ago
Comments: 25 (12 by maintainers)

Most upvoted comments

Had the same problem with the recording not working. Just solved it for testing purpoces.

You need to specify the input device to the ffmpeg_microphone() function:

Find your microphone name using cmd: ffmpeg -list_devices true -f dshow -i dummy Copy the name and edit it in line 75 in -> ….venv\Lib\site-packages\transformers\pipelines\audio_utils.py Instead of “default” put “audio=input_device_name”

    elif system == "Windows":
        format_ = "dshow"
        input_ = "audio=Microphone (High Definition Audio Device)"

Teapack1 on Oct 18, 2023

Edit: it seems like the problem lies with setting the input for ffmpeg on MacOS to :1 instead.

vymao on Oct 25, 2023

Having some problems on Mac currently, trying to follow this guide. Right now, I can get output, but it doesn’t seem to actually be listening to the microphone and only predicting one class for speech classification, no matter what I say, after granting Terminal access to the microphone:

/Users/victor/anaconda3/envs/transformers/lib/python3.9/site-packages/transformers/models/audio_spectrogram_transformer/feature_extraction_audio_spectrogram_transformer.py:96: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_9d63z49rj_/croot/pytorch_1681837279022/work/torch/csrc/utils/tensor_numpy.cpp:205.)
  waveform = torch.from_numpy(waveform).unsqueeze(0)
{'score': 0.05440586432814598, 'label': 'no'}
{'score': 0.05816075950860977, 'label': 'up'}
{'score': 0.07136523723602295, 'label': 'up'}
{'score': 0.09769058227539062, 'label': 'follow'}
{'score': 0.14641302824020386, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}

Curious if anyone else had this issue? Not sure if the warning message has anything to do with this. Is ffmpeg_microphone supposed to support MacOS?

vymao on Oct 25, 2023