transformers.js: Error when trying MusicGen example - 127037464
System Info
- Latest version of V3
- Running in webworker
- Macbook pro 14
- Brave browser
The error being caught is a number, which stays the same on each run: 127037464
.
To remove as many variables as possible I then tried a simpler version of the example. Unfortunately I saw the same error, just with a different number: Uncaught 168274888
Environment/Platform
- Website/web-app
- Browser extension
- Server-side (e.g., Node.js, Deno, Bun)
- Desktop app (e.g., Electron)
- Other (e.g., VSCode extension)
Description
The MusicGen example generates an error instead of an audio array.
Reproduction
Steps taken to test:
git clone -b v3 https://github.com/xenova/transformers.js.git
cd transformers.js/
npm i
npm run build
Then using the contents of dist
as the js
folder in this minimal example:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>Musicgen</title>
</head>
<body>
<script type="module">
import { AutoTokenizer, MusicgenForConditionalGeneration } from './js/transformers.js';
// Load tokenizer and model
const tokenizer = await AutoTokenizer.from_pretrained('Xenova/musicgen-small');
const model = await MusicgenForConditionalGeneration.from_pretrained(
'Xenova/musicgen-small', { dtype: 'fp32' }
);
// Prepare text input
const prompt = '80s pop track with bassy drums and synth';
const inputs = tokenizer(prompt);
// Generate audio
const audio_values = await model.generate({
...inputs,
max_new_tokens: 512,
do_sample: true,
guidance_scale: 3,
});
console.log("audio_values: ", audio_values);
/*
// (Optional) Write the output to a WAV file
import { wavefile } from './js/wavefile.js';
const wav = new wavefile.WaveFile();
wav.fromScratch(1, model.config.audio_encoder.sampling_rate, '32f', audio_values.data);
*/
</script>
</body>
</html>
About this issue
- Original URL
- State: open
- Created 3 months ago
- Comments: 21 (8 by maintainers)
Hi again! I’ve done some additional testing and added per-model dtypes and devices, so you can do the following:
Also, I’ve merged the PR which improved quantization settings (so you don’t need to specify revision), so just remember to clear your cache in case it’s still using the old files.
The output is pretty good now!
https://github.com/xenova/transformers.js/assets/26504141/138eeb3e-adf9-4410-87e1-7ace0d618d2b
Next step will be adding token streaming so you can get the progress in a non-hacky way. 😃
Wow that is amazing! 🔥 Great stuff! 🚀 The 8-bit quantized version still seems to have some issues (audio is not perfect), but I’ll play around with a few more things to try get it working better!
To answer your questions:
That’s right 😃 I’ll do some more exploration of the effect different quantization settings have on the output (as well as trying out different settings for each sub-model (text-encoder, musicgen-decoder, encodec-decoder).
You should be able to do
audio_values.data
to get it. When this works with thetext-to-audio
pipeline, the API will be much easier to interact with, including being able to save and play the audio with.save()
and.play()
, thanks to https://github.com/xenova/transformers.js/pull/682.We’re planning on updating the API to include support for a
Streamer
(docs in transformers), which will run a function whenever a new token is generated. Stay tuned 😃This enables sampling the predicted probability distribution to produce the next token. If set to
false
, the model will generated “greedily” (choosing the most probable token at each step).do_sample=true
means the model can generate different songs each generation. For musicgen, it’s highly encouraged to keep this set totrue
, otherwise the model can get “stuck” and produce noise. See here for more information.Absolutely! Go for it 😃
I think a demo will be good once we’ve got WebGPU support working (and would make everything significantly faster), so stay tuned for that!
That did it!
IT WORKS!
Oh thanks, I know what the issue is and I’ll fix it tomorrow!
Thanks for the tip. I tried it, but with guidancescale set to null I unfortunately still get the error.
168274888
. I should have gone for the 32GB Macbook…I started to suspect as much.
Rock’n. My code it now ready 😃
Hi there! This is due to an out-of-memory error, which is primarily due to the fact that you’re loading in full-precision (fp32). The code in the v3 thread was only tested with Node.js, as shown by the use of
fs
. Fortunately, we’re almost done with the WebGPU implementation, which will work in the browser with fp16 quantization (possible even lower).I will update you when it does work!
One think you could try is to set
guidance_scale
tonull
, as specifying a value > 1 will increase the batch size (+ memory) 2x.