transformers: audio pipeline utility ffmpeg_microphone_live doesn't work in Google Colab

System Info

Googel Colab 2023/07/21 / Chrome 115.0.5790.114 / macOS 13.5 Default MacBook microphone

Who can help?

@sanchit-gandhi

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

  1. make sure microphone works and access to microphone is enabled in Colab
  2. follow https://huggingface.co/learn/audio-course/chapter7/voice-assistant

Expected behavior

Colab cell launch_fn(debug=True) should output scores while listening for the wake word as described in the tutorial. Instead, nothing happens. No error is shown and no audio is recorded. This makes the transcribe cell in the tutorial crash.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 25 (12 by maintainers)

Most upvoted comments

Had the same problem with the recording not working. Just solved it for testing purpoces.

You need to specify the input device to the ffmpeg_microphone() function:

Find your microphone name using cmd: ffmpeg -list_devices true -f dshow -i dummy Copy the name and edit it in line 75 in -> ….venv\Lib\site-packages\transformers\pipelines\audio_utils.py Instead of “default” put “audio=input_device_name

    elif system == "Windows":
        format_ = "dshow"
        input_ = "audio=Microphone (High Definition Audio Device)"

Edit: it seems like the problem lies with setting the input for ffmpeg on MacOS to :1 instead.

Having some problems on Mac currently, trying to follow this guide. Right now, I can get output, but it doesn’t seem to actually be listening to the microphone and only predicting one class for speech classification, no matter what I say, after granting Terminal access to the microphone:

/Users/victor/anaconda3/envs/transformers/lib/python3.9/site-packages/transformers/models/audio_spectrogram_transformer/feature_extraction_audio_spectrogram_transformer.py:96: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /private/var/folders/sy/f16zz6x50xz3113nwtb9bvq00000gp/T/abs_9d63z49rj_/croot/pytorch_1681837279022/work/torch/csrc/utils/tensor_numpy.cpp:205.)
  waveform = torch.from_numpy(waveform).unsqueeze(0)
{'score': 0.05440586432814598, 'label': 'no'}
{'score': 0.05816075950860977, 'label': 'up'}
{'score': 0.07136523723602295, 'label': 'up'}
{'score': 0.09769058227539062, 'label': 'follow'}
{'score': 0.14641302824020386, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}
{'score': 0.1379959136247635, 'label': 'follow'}

Curious if anyone else had this issue? Not sure if the warning message has anything to do with this. Is ffmpeg_microphone supposed to support MacOS?