openai-node: [Whisper] cannot call `createTranscription` function from Node.js due to File API

Describe the bug

Cannot call createTranscription function like below:

...
const audio = await fs.readFile('path/to/audio.mp4');
// Compile Error at the first argument
const response = await openai.createTranscription(audio, 'whisper-1');

This is because createTranscription interface asks me for File API, which is mainly for Browser API.

public createTranscription(file: File, model: string, prompt?: string, responseFormat?: string, temperature?: number, language?: string, options?: AxiosRequestConfig) {
  return OpenAIApiFp(this.configuration).createTranscription(file, model, prompt, responseFormat, temperature, language, options).then((request) => request(this.axios, this.basePath));
}

How can I use this function from Node.js? Thanks!

Node.js version: v18.14.2
MacOS Monterey

To Reproduce

...
const audio = await fs.readFile('path/to/audio.mp4');
// Compile Error at the first argument
const response = await openai.createTranscription(audio, 'whisper-1');

Code snippets

No response

OS

MacOS

Node version

Node v18.14.2

Library version

openai v3.2.1

About this issue

Original URL
State: closed
Created a year ago
Reactions: 17
Comments: 27

Commits related to this issue

tried https://github.com/openai/openai-node/issues/77#issuecomment-1500899486 but weird fluent-ffmpeg error — committed to mkandan/dubdubs by mkandan a year ago

Most upvoted comments

It seems OpenAI is hacking around MIME type discovery for the input audio file by using the .path property of a stream - it’s present on fs.createReadStream() by default, which is why that works and Readable.from() does not.

You can do this:

const audioReadStream = Readable.from(audioBuffer);
audioReadStream.path = 'conversation.wav';
const {data: {text}} = await openai.createTranscription(audioReadStream, 'whisper-1');

+48

jacoblee93 on Mar 5, 2023

Add support for loading file from Blob, Stream or base64 encoded string.

+35

rmtuckerphx on Mar 2, 2023

Workaround at this time

await openai.createTranscription(fs.createReadStream('/path/to/audio.m4a') as any, 'whisper-1');

+26

chrg1001 on Mar 5, 2023

+1 for support for other formats, as @rmtuckerphx mentioned, especially for those migrating from Google’s Speech-to-text (base64)

estevanmaito on Mar 2, 2023

+1 please support other formats, I’m surprised this was overlooked

Pckool on Mar 2, 2023

The “Readable” stream doesn’t work either, so if your file is in memory, the only way to upload it seems to be by writing to the disk first and then using createReadStream.

zlenner on Mar 3, 2023

That API is such a burden to use today.

I came up with a solution to avoid storing the file on the server.


const axios = require("axios");
const ffmpeg = require("fluent-ffmpeg");
const { Readable, Writable } = require("stream");
const fs = require("fs");
const { Configuration, OpenAIApi } = require("openai");


async function callWhisper(url, languageCode) {
  try {
    // Getting the audio form an URL
    const response = await axios.get("YOUR_URL", {
      responseType: "arraybuffer",
    });
    // Making a stream out of the buffer
    const inputStream = arrayBufferToStream(response.data);
    // We want to avoid the 25 MB limitation and ensure that the audio file is within the acceptable size range for the API.
    const resizedBuffer = await reduceBitrate(inputStream);
    //  This step is necessary because the OpenAI API expects a stream as input for the audio file.
    const resizedStream = bufferToReadableStream(resizedBuffer, "audio.mp3");
    const configuration = new Configuration({
      apiKey: process.env.OPEN_API_KEY
    });

    const openai = new OpenAIApi(configuration);
    let prompt = "YOUR PROMPT"

    const resp = await openai.createTranscription(resizedStream, "whisper-1", prompt, "verbose_json", 0.8, language_code, { maxContentLength: Infinity, maxBodyLength: Infinity });
   return resp.data
  } catch (error) {
    console.error(error);
  }
}

callWhisper();


function reduceBitrate(inputStream) {
  return new Promise((resolve, reject) => {
    const outputChunks = [];
    ffmpeg(inputStream)
      .audioBitrate(64) // low quality. You can update that
      .on("error", reject)
      .on("end", () => resolve(Buffer.concat(outputChunks)))
      .format("mp3")
      .pipe(
        new Writable({
          write(chunk, encoding, callback) {
            outputChunks.push(chunk);
            callback();
          },
        })
      );
  });
}

function bufferToReadableStream(buffer, filename) {
  const readable = new Readable({
    read() {
      this.push(buffer);
      this.push(null);
    },
  });
  readable.path = filename;
  return readable;
}
function arrayBufferToStream(buffer) {
  const readable = new Readable({
    read() {
      this.push(Buffer.from(buffer));
      this.push(null);
    },
  });
  return readable;
}

romain130492 on Apr 8, 2023

Hi all, we have an upcoming version v4.0.0 that overhauls file upload support extensively; please give it a try and let us know in the thread whether it suits your needs!

Thanks dude, the update fixed the issue 👍

huboh on Jun 30, 2023

When working with FormData, this works for me (using .webm) without storing a file on the server and/or using ffmpeg:

Server.js

const data = Object.fromEntries(await request.formData());
const fileStream = Readable.from(Buffer.from(await (data.audio as Blob).arrayBuffer()));
// @ts-expect-error Workaround till OpenAI fixed the sdk
fileStream.path = 'audio.webm';
const transcription = await openai.createTranscription(
  fileStream as unknown as File,
  'whisper-1'
);

Client.js

let media: Blob[] = [];
let mediaRecorder: MediaRecorder;

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = async (e) => {
  if (!e.data?.size) return;
  media.push(e.data);
};

async function upload() {
  const formData = new FormData();
  formData.append('audio', new Blob(media, { type: 'audio/webm;codecs=opus' }), 'audio.webm');
  await fetch('/', { method: 'POST', body: formData });
}

I’m using the MediaRecorder in this case, but a simple form would also work.

<form method="POST">
  <input type="file" name="audio" />
  <button type="submit">Upload</button>
</form>

B3nsten on Apr 15, 2023

Safari records audio as audio/mp4 when using javascript’s MediaRecoder. And it doesn’t seem like it is possible to trick openai with the stream.path technique above because the file would need to be converted

Does anyone have any ideas of how to sort this without converting the file?

I’m also facing this issue. I was able to convert from mp4 to mp3 using ffmpeg, but this isn’t an ideal solution and I’m hoping the API will be fixed.

jakowenko on Mar 20, 2023

I cannot get this API to work without writing every file to disk, which I very much do not want to do. Pretty big oversight, hoping for a fix.

ctb248 on Mar 23, 2023

It seems OpenAI is hacking around MIME type discovery for the input audio file by using the .path property of a stream - it’s present on fs.createReadStream() by default, which is why that works and Readable.from() does not.

You can do this:
const audioReadStream = Readable.from(audioBuffer);
audioReadStream.path = 'conversation.wav';
const {data: {text}} = await openai.createTranscription(audioReadStream, 'whisper-1');

As a note for anyone using TypeScript, you should use @ts-expect-error` here incase this is ever fixed and you can update the functionality.

DeveloperRyan on Mar 10, 2023

Workaround at this time

await openai.createTranscription(fs.createReadStream('/path/to/audio') as any, 'whisper-1');

Thanks for the workaround. I am seeing a 400 response when I use this approach. The response indicates Invalid file format. The file format is webm and the same file works with curl.

Edit:

Apologies for the spam. In my case, I had to rename /path/to/audio so that it included the file extension like /path/to/audio.webm

nrempel on Mar 5, 2023

Hey guys, with the new update to the API in November 2023, the solutions above no longer work… I made a post about this in the OpenAI developer forum: https://community.openai.com/t/creating-readstream-from-audio-buffer-for-whisper-api/534380. Does anyone know a solution?

EJKT on Dec 3, 2023

I cannot get this API to work without writing every file to disk, which I very much do not want to do. Pretty big oversight, hoping for a fix.

Here’s a streaming example that might help?

I’m piping an http .ogg audio request stream into ffmpeg to convert it into an mp3 audio stream which I pipe into an OpenAI transcription request.

import {spawn} from 'child_process'
import {Readable} from 'stream'

async function transcribe(input: Readable) {
    // Converting .ogg to .mp3
    const proc = spawn('ffmpeg', ['-f', 'ogg', '-i', '-', '-f', 'mp3', '-'])
    input.pipe(proc.stdin)
    proc.stdout.path = 'upload.mp3' // Necessary to quack like a file upload
    const result = await openai.createTranscription(proc.stdout, 'whisper-1')
    return result.data.text
}

async function example() {
    const response = await fetch('http://example.com/audio.ogg')
    const nodeStream = Readable.fromWeb(response.body)
    const transcription = await transcribe(nodeStream)
    console.log('the audio file said: ', transcription)
}

danneu on Mar 25, 2023

Workaround at this time

await openai.createTranscription(fs.createReadStream('/path/to/audio.m4a') as any, 'whisper-1');

This doesn’t help as it returns an error message saying “Type assertion expressions can only be used in TypeScript files.”

ghagevaibhav on Apr 16, 2024

Hi all, we have an upcoming version v4.0.0 that overhauls file upload support extensively; please give it a try and let us know in the thread whether it suits your needs!

rattrayalex on Jun 27, 2023

Nuxt3 has Cross-platform support for Node.js, Browsers, service-workers and more. Serverless support out of the box. This code works real nice:

import { Configuration, OpenAIApi } from "openai";
import fs from "node:fs";

export default defineEventHandler(async (event) => {

    const config = useRuntimeConfig()

    const configuration = new Configuration({
        apiKey: config.OPENAI_API_KEY,
    });
    const openai = new OpenAIApi(configuration);

    try {

        const resp = await openai.createTranscription(
            // @ts-ignore
            fs.createReadStream("audio.mp3"), // Can use the file event here
            "whisper-1"
        );

        return resp.data
    } catch (error) {
        console.error('server error', error)
    }


})

dosstx on Apr 21, 2023

has anyone maybe an example with vercel edge functions?

Manubi on Apr 21, 2023