openai-python: [Whisper] Audio format errors on valid file

Describe the bug

Hello

I am trying to integrate the whisper API into my Flask app. However I get the following error when I input the received file from the flask endpoint, I get the following error:

openai.error.InvalidRequestError: Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']

However, loading the file in the interactive console works fine.

In [16]: r = openai.Audio.transcribe('whisper-1',open('../Downloads/sample.mp3','rb'))

In [17]: r
Out[17]:
<OpenAIObject at 0x192993c6750> JSON: {
  "text": "This episode is actually a co-production with another podcast called Digital Folklore, which is hosted by Mason Amadeus and Perry Carpenter. We've been doing a lot of our research together and our brainstorming sessions have been so thought-provoking, I wanted to bring them on so we could discuss the genre of analog horror together. So, why don't you guys introduce yourselves so we know who's who? Yeah, this is Perry Carpenter and I'm one of the hosts of Digital Folklore. And I'm Mason Amadeus and I'm the other host of Digital Folklore. And tell me, what is Digital Folklore? Yeah, so Digital Folklore is the evolution of folklore, you know, the way that we typically think about it. And folklore really is the product of basically anything that humans create that doesn't have a centralized canon. But when we talk about digital folklore, we're talking about..."
}

To Reproduce

  1. Create a Flask App.
  2. Add an end point that receives an valid audio file.
  3. pass the bytes data of the file to openai.Audio.transcribe method through 'request.files[fileName].stream.read()`.

Code snippets

The end point code:


with tempfile.TemporaryFile() as temp_file:
    temp_file.write(audio_file)
    transcript_read = openai.Audio.transcribe("whisper-1", temp_file)
return transcript_read

the FFprobe info of the file:

ffprobe version 4.4.1-full_build-www.gyan.dev Copyright (c) 2007-2021 the FFmpeg developers
  built with gcc 11.2.0 (Rev1, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-librist --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libzvbi --enable-librav1e --enable-libsvtav1 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libglslang --enable-vulkan --enable-opencl --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, mp3, from '.\Downloads\sample.mp3':
  Metadata:
    title           : Monsters in the Static
    comment         : We look at the subgenre of analog horror, where something sinister might be lurking in the horizontal lines and vertical holds of those old VHS tapes.
    lyrics-ENG      : <p>In the subgenre of analog horror, there’s something sinister or supernatural lurking in the horizontal lines and vertical holds in those old VHS tapes. Filmmaker <a href="https://wnuf.bigcartel.com/">Chris LaMartina</a> explains why he wanted his mov
    album           : Imaginary Worlds
    genre           : Podcast
    date            : 2020
    encoder         : Lavf58.76.100
  Duration: 00:00:50.05, start: 0.025057, bitrate: 128 kb/s
  Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.13

OS

Windows 11

Python version

Python v10.5

Library version

0.27.2

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 5
  • Comments: 16 (1 by maintainers)

Most upvoted comments

I fought with this for a long time. Finally got it working by not using the MediaRecorder() API on the frontend. I switched to using

  const startRecording = () => {
    setIsRecording(true)
    navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => {
      const options = {
        type: 'audio',
        mimeType: 'audio/mp3',
        numberOfAudioChannels: 1,
        recorderType: RecordRTC.StereoAudioRecorder,
        checkForInactiveTracks: true,
        timeSlice: 5000,
        ondataavailable: (blob) => {
          socket.emit('audio', { buffer: blob })
        },
      }

      const recordRTC = new RecordRTC(stream, options)
      setRecorder(recordRTC)
      recordRTC.startRecording()
    })
  }

and it worked immediately.

I have found a workaround using replicate’s implementation. It requires exposing a link to a file because replicate only works with hyperlinks. I am hoping the issue would be resolved by the time I am going live.

If you are testing on local, you can use ngrok for the file link.

On Fri, Mar 24, 2023 at 4:28 PM Sumeyye Yegen @.***> wrote:

were you able to find a solution? i am getting the same error. it works very well in jupyter notebook app. but I keep getting this error in the hugging face application.

— Reply to this email directly, view it on GitHub https://github.com/openai/openai-python/issues/333#issuecomment-1482654291, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2VVZR4CCOS2UW4WJYL4WTW5WAMNANCNFSM6AAAAAAWDXNA24 . You are receiving this because you authored the thread.Message ID: @.***>

Try with this code:

with tempfile.NamedTemporaryFile(suffix='.mp3') as temp_file:
    temp_file.write(audio_file)
    temp_file.flush()
    temp_file.seek(0)

    transcript_read = openai.Audio.transcribe("whisper-1", temp_file)

I have a variation on the solution using RecordRTC which was posted above. It shows how to use start/stop, reset the audio channel, send with Ajax request (multi-part form data)

if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
        navigator.mediaDevices.getUserMedia({ audio: true })
        .then((stream) => {
            const options = {
                type: 'audio',
                mimeType: 'audio/mp3',
                numberOfAudioChannels: 1,
                recorderType: RecordRTC.StereoAudioRecorder
            }
 
            const recordRTC = new RecordRTC(stream, options);

            $(document).on("click", "#record-button", () => {
                let recordButton = $("#record-button");
                // already recording, hit stop
                if(recordButton.attr("recording") === "true") {
                    recordButton.attr("recording", false)
                    recordButton.html("REC");

                    recordRTC.stopRecording(async () => {
                        let blob = await recordRTC.getBlob();
                        var form = new FormData();
                        form.append("file", blob);
                        $.ajax({
                            type: "POST",
                            data: form,
                            url: "",
                            processData: false,
                            contentType: false,
                            success: function (data) {
                                // ...
                                recordRTC.reset();
                            },
                            error: (err) => {
                                // ...
                                recordRTC.reset();
                            }
                        });
                    });
                }
                // not recording, hit play
                else {
                    //mediaRecorder.start();
                    recordButton.attr("recording", true);
                    recordButton.html("STOP");

                    recordRTC.startRecording();
                }
            });
        })
        // Error callback
        .catch((err) => {
            console.error(`The following getUserMedia error occurred: ${err}`);
        });
}