openai-node: [Whisper] cannot call `createTranscription` function from Node.js due to File API

Describe the bug

Cannot call createTranscription function like below:

...
const audio = await fs.readFile('path/to/audio.mp4');
// Compile Error at the first argument
const response = await openai.createTranscription(audio, 'whisper-1');

This is because createTranscription interface asks me for File API, which is mainly for Browser API.

public createTranscription(file: File, model: string, prompt?: string, responseFormat?: string, temperature?: number, language?: string, options?: AxiosRequestConfig) {
  return OpenAIApiFp(this.configuration).createTranscription(file, model, prompt, responseFormat, temperature, language, options).then((request) => request(this.axios, this.basePath));
}

How can I use this function from Node.js? Thanks!


Node.js version: v18.14.2
MacOS Monterey

To Reproduce

...
const audio = await fs.readFile('path/to/audio.mp4');
// Compile Error at the first argument
const response = await openai.createTranscription(audio, 'whisper-1');

Code snippets

No response

OS

MacOS

Node version

Node v18.14.2

Library version

openai v3.2.1

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 17
  • Comments: 27

Commits related to this issue

Most upvoted comments

It seems OpenAI is hacking around MIME type discovery for the input audio file by using the .path property of a stream - it’s present on fs.createReadStream() by default, which is why that works and Readable.from() does not.

You can do this:

const audioReadStream = Readable.from(audioBuffer);
audioReadStream.path = 'conversation.wav';
const {data: {text}} = await openai.createTranscription(audioReadStream, 'whisper-1');

Add support for loading file from Blob, Stream or base64 encoded string.

Workaround at this time

await openai.createTranscription(fs.createReadStream('/path/to/audio.m4a') as any, 'whisper-1');

+1 for support for other formats, as @rmtuckerphx mentioned, especially for those migrating from Google’s Speech-to-text (base64)

+1 please support other formats, I’m surprised this was overlooked

The “Readable” stream doesn’t work either, so if your file is in memory, the only way to upload it seems to be by writing to the disk first and then using createReadStream.

That API is such a burden to use today.

I came up with a solution to avoid storing the file on the server.


const axios = require("axios");
const ffmpeg = require("fluent-ffmpeg");
const { Readable, Writable } = require("stream");
const fs = require("fs");
const { Configuration, OpenAIApi } = require("openai");


async function callWhisper(url, languageCode) {
  try {
    // Getting the audio form an URL
    const response = await axios.get("YOUR_URL", {
      responseType: "arraybuffer",
    });
    // Making a stream out of the buffer
    const inputStream = arrayBufferToStream(response.data);
    // We want to avoid the 25 MB limitation and ensure that the audio file is within the acceptable size range for the API.
    const resizedBuffer = await reduceBitrate(inputStream);
    //  This step is necessary because the OpenAI API expects a stream as input for the audio file.
    const resizedStream = bufferToReadableStream(resizedBuffer, "audio.mp3");
    const configuration = new Configuration({
      apiKey: process.env.OPEN_API_KEY
    });

    const openai = new OpenAIApi(configuration);
    let prompt = "YOUR PROMPT"

    const resp = await openai.createTranscription(resizedStream, "whisper-1", prompt, "verbose_json", 0.8, language_code, { maxContentLength: Infinity, maxBodyLength: Infinity });
   return resp.data
  } catch (error) {
    console.error(error);
  }
}

callWhisper();


function reduceBitrate(inputStream) {
  return new Promise((resolve, reject) => {
    const outputChunks = [];
    ffmpeg(inputStream)
      .audioBitrate(64) // low quality. You can update that
      .on("error", reject)
      .on("end", () => resolve(Buffer.concat(outputChunks)))
      .format("mp3")
      .pipe(
        new Writable({
          write(chunk, encoding, callback) {
            outputChunks.push(chunk);
            callback();
          },
        })
      );
  });
}

function bufferToReadableStream(buffer, filename) {
  const readable = new Readable({
    read() {
      this.push(buffer);
      this.push(null);
    },
  });
  readable.path = filename;
  return readable;
}
function arrayBufferToStream(buffer) {
  const readable = new Readable({
    read() {
      this.push(Buffer.from(buffer));
      this.push(null);
    },
  });
  return readable;
}


Hi all, we have an upcoming version v4.0.0 that overhauls file upload support extensively; please give it a try and let us know in the thread whether it suits your needs!

Thanks dude, the update fixed the issue 👍

When working with FormData, this works for me (using .webm) without storing a file on the server and/or using ffmpeg:

Server.js

const data = Object.fromEntries(await request.formData());
const fileStream = Readable.from(Buffer.from(await (data.audio as Blob).arrayBuffer()));
// @ts-expect-error Workaround till OpenAI fixed the sdk
fileStream.path = 'audio.webm';
const transcription = await openai.createTranscription(
  fileStream as unknown as File,
  'whisper-1'
);

Client.js

let media: Blob[] = [];
let mediaRecorder: MediaRecorder;

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = async (e) => {
  if (!e.data?.size) return;
  media.push(e.data);
};

async function upload() {
  const formData = new FormData();
  formData.append('audio', new Blob(media, { type: 'audio/webm;codecs=opus' }), 'audio.webm');
  await fetch('/', { method: 'POST', body: formData });
}

I’m using the MediaRecorder in this case, but a simple form would also work.

<form method="POST">
  <input type="file" name="audio" />
  <button type="submit">Upload</button>
</form>

Safari records audio as audio/mp4 when using javascript’s MediaRecoder. And it doesn’t seem like it is possible to trick openai with the stream.path technique above because the file would need to be converted

Does anyone have any ideas of how to sort this without converting the file?

I’m also facing this issue. I was able to convert from mp4 to mp3 using ffmpeg, but this isn’t an ideal solution and I’m hoping the API will be fixed.

I cannot get this API to work without writing every file to disk, which I very much do not want to do. Pretty big oversight, hoping for a fix.

It seems OpenAI is hacking around MIME type discovery for the input audio file by using the .path property of a stream - it’s present on fs.createReadStream() by default, which is why that works and Readable.from() does not.

You can do this:

const audioReadStream = Readable.from(audioBuffer);
audioReadStream.path = 'conversation.wav';
const {data: {text}} = await openai.createTranscription(audioReadStream, 'whisper-1');

As a note for anyone using TypeScript, you should use @ts-expect-error` here incase this is ever fixed and you can update the functionality.

Workaround at this time

await openai.createTranscription(fs.createReadStream('/path/to/audio') as any, 'whisper-1');

Thanks for the workaround. I am seeing a 400 response when I use this approach. The response indicates Invalid file format. The file format is webm and the same file works with curl.

Edit:

Apologies for the spam. In my case, I had to rename /path/to/audio so that it included the file extension like /path/to/audio.webm

Hey guys, with the new update to the API in November 2023, the solutions above no longer work… I made a post about this in the OpenAI developer forum: https://community.openai.com/t/creating-readstream-from-audio-buffer-for-whisper-api/534380. Does anyone know a solution?

I cannot get this API to work without writing every file to disk, which I very much do not want to do. Pretty big oversight, hoping for a fix.

Here’s a streaming example that might help?

I’m piping an http .ogg audio request stream into ffmpeg to convert it into an mp3 audio stream which I pipe into an OpenAI transcription request.

import {spawn} from 'child_process'
import {Readable} from 'stream'

async function transcribe(input: Readable) {
    // Converting .ogg to .mp3
    const proc = spawn('ffmpeg', ['-f', 'ogg', '-i', '-', '-f', 'mp3', '-'])
    input.pipe(proc.stdin)
    proc.stdout.path = 'upload.mp3' // Necessary to quack like a file upload
    const result = await openai.createTranscription(proc.stdout, 'whisper-1')
    return result.data.text
}

async function example() {
    const response = await fetch('http://example.com/audio.ogg')
    const nodeStream = Readable.fromWeb(response.body)
    const transcription = await transcribe(nodeStream)
    console.log('the audio file said: ', transcription)
}

Workaround at this time

await openai.createTranscription(fs.createReadStream('/path/to/audio.m4a') as any, 'whisper-1');

This doesn’t help as it returns an error message saying “Type assertion expressions can only be used in TypeScript files.”

Hi all, we have an upcoming version v4.0.0 that overhauls file upload support extensively; please give it a try and let us know in the thread whether it suits your needs!

Nuxt3 has Cross-platform support for Node.js, Browsers, service-workers and more. Serverless support out of the box. This code works real nice:

import { Configuration, OpenAIApi } from "openai";
import fs from "node:fs";

export default defineEventHandler(async (event) => {

    const config = useRuntimeConfig()

    const configuration = new Configuration({
        apiKey: config.OPENAI_API_KEY,
    });
    const openai = new OpenAIApi(configuration);

    try {

        const resp = await openai.createTranscription(
            // @ts-ignore
            fs.createReadStream("audio.mp3"), // Can use the file event here
            "whisper-1"
        );

        return resp.data
    } catch (error) {
        console.error('server error', error)
    }


})

has anyone maybe an example with vercel edge functions?