openai-node: [Whisper] cannot call `createTranscription` function from Node.js due to File API
Describe the bug
Cannot call createTranscription
function like below:
...
const audio = await fs.readFile('path/to/audio.mp4');
// Compile Error at the first argument
const response = await openai.createTranscription(audio, 'whisper-1');
This is because createTranscription
interface asks me for File API, which is mainly for Browser API.
public createTranscription(file: File, model: string, prompt?: string, responseFormat?: string, temperature?: number, language?: string, options?: AxiosRequestConfig) {
return OpenAIApiFp(this.configuration).createTranscription(file, model, prompt, responseFormat, temperature, language, options).then((request) => request(this.axios, this.basePath));
}
How can I use this function from Node.js? Thanks!
Node.js version: v18.14.2
MacOS Monterey
To Reproduce
...
const audio = await fs.readFile('path/to/audio.mp4');
// Compile Error at the first argument
const response = await openai.createTranscription(audio, 'whisper-1');
Code snippets
No response
OS
MacOS
Node version
Node v18.14.2
Library version
openai v3.2.1
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 17
- Comments: 27
Commits related to this issue
- tried https://github.com/openai/openai-node/issues/77#issuecomment-1500899486 but weird fluent-ffmpeg error — committed to mkandan/dubdubs by mkandan a year ago
It seems OpenAI is hacking around MIME type discovery for the input audio file by using the
.path
property of a stream - it’s present onfs.createReadStream()
by default, which is why that works andReadable.from()
does not.You can do this:
Add support for loading file from Blob, Stream or base64 encoded string.
Workaround at this time
+1 for support for other formats, as @rmtuckerphx mentioned, especially for those migrating from Google’s Speech-to-text (base64)
+1 please support other formats, I’m surprised this was overlooked
The “Readable” stream doesn’t work either, so if your file is in memory, the only way to upload it seems to be by writing to the disk first and then using createReadStream.
That API is such a burden to use today.
I came up with a solution to avoid storing the file on the server.
Thanks dude, the update fixed the issue 👍
When working with FormData, this works for me (using .webm) without storing a file on the server and/or using ffmpeg:
Server.js
Client.js
I’m using the MediaRecorder in this case, but a simple form would also work.
I’m also facing this issue. I was able to convert from mp4 to mp3 using ffmpeg, but this isn’t an ideal solution and I’m hoping the API will be fixed.
I cannot get this API to work without writing every file to disk, which I very much do not want to do. Pretty big oversight, hoping for a fix.
As a note for anyone using TypeScript, you should use @ts-expect-error` here incase this is ever fixed and you can update the functionality.
Thanks for the workaround. I am seeing a
400
response when I use this approach. The response indicatesInvalid file format.
The file format iswebm
and the same file works withcurl
.Edit:
Apologies for the spam. In my case, I had to rename
/path/to/audio
so that it included the file extension like/path/to/audio.webm
Hey guys, with the new update to the API in November 2023, the solutions above no longer work… I made a post about this in the OpenAI developer forum: https://community.openai.com/t/creating-readstream-from-audio-buffer-for-whisper-api/534380. Does anyone know a solution?
Here’s a streaming example that might help?
I’m piping an http .ogg audio request stream into ffmpeg to convert it into an mp3 audio stream which I pipe into an OpenAI transcription request.
This doesn’t help as it returns an error message saying “Type assertion expressions can only be used in TypeScript files.”
Hi all, we have an upcoming version v4.0.0 that overhauls file upload support extensively; please give it a try and let us know in the thread whether it suits your needs!
Nuxt3 has Cross-platform support for Node.js, Browsers, service-workers and more. Serverless support out of the box. This code works real nice:
has anyone maybe an example with vercel edge functions?