transformers.js: Error when trying MusicGen example - 127037464

System Info

  • Latest version of V3
  • Running in webworker
  • Macbook pro 14
  • Brave browser

The error being caught is a number, which stays the same on each run: 127037464.

To remove as many variables as possible I then tried a simpler version of the example. Unfortunately I saw the same error, just with a different number: Uncaught 168274888

Environment/Platform

  • Website/web-app
  • Browser extension
  • Server-side (e.g., Node.js, Deno, Bun)
  • Desktop app (e.g., Electron)
  • Other (e.g., VSCode extension)

Description

The MusicGen example generates an error instead of an audio array.

Reproduction

Steps taken to test:

git clone -b v3 https://github.com/xenova/transformers.js.git
cd transformers.js/
npm i
npm run build

Then using the contents of dist as the js folder in this minimal example:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>Musicgen</title>
  </head>
  <body>
	<script type="module">
		import { AutoTokenizer, MusicgenForConditionalGeneration } from './js/transformers.js';

		// Load tokenizer and model
		const tokenizer = await AutoTokenizer.from_pretrained('Xenova/musicgen-small');
		const model = await MusicgenForConditionalGeneration.from_pretrained(
		  'Xenova/musicgen-small', { dtype: 'fp32' }
		);

		// Prepare text input
		const prompt = '80s pop track with bassy drums and synth';
		const inputs = tokenizer(prompt);

		// Generate audio
		const audio_values = await model.generate({
		  ...inputs,
		  max_new_tokens: 512,
		  do_sample: true,
		  guidance_scale: 3,
		});

		console.log("audio_values: ", audio_values);
		/*
		// (Optional) Write the output to a WAV file
		import { wavefile } from './js/wavefile.js';


		const wav = new wavefile.WaveFile();
		wav.fromScratch(1, model.config.audio_encoder.sampling_rate, '32f', audio_values.data);
		*/
	</script>
	
  </body>
</html>

About this issue

  • Original URL
  • State: open
  • Created 3 months ago
  • Comments: 21 (8 by maintainers)

Most upvoted comments

Hi again! I’ve done some additional testing and added per-model dtypes and devices, so you can do the following:

const model = await MusicgenForConditionalGeneration.from_pretrained(model_id, {
    dtype: {
        text_encoder: 'q8', // or 'fp32'. Both seem to work well, but q8 provides 4x memory reduction.
        decoder_model_merged: 'q8', // IMPORTANT: otherwise, you'll get out-of-memory issues
        encodec_decode: 'fp32', // IMPORTANT: If not full-precision, quality won't be very good.
    },
    device: {
        text_encoder: 'webgpu', // much faster :)
        decoder_model_merged: 'wasm', // webgpu is slower at the moment due to inefficient buffer reuse. Will fix.
        encodec_decode: 'wasm', // webgpu is currently broken (known upstream bug in onnxruntime-web). Will be fixed soon.
    },
});

Also, I’ve merged the PR which improved quantization settings (so you don’t need to specify revision), so just remember to clear your cache in case it’s still using the old files.

The output is pretty good now!

https://github.com/xenova/transformers.js/assets/26504141/138eeb3e-adf9-4410-87e1-7ace0d618d2b

Next step will be adding token streaming so you can get the progress in a non-hacky way. 😃

Wow that is amazing! 🔥 Great stuff! 🚀 The 8-bit quantized version still seems to have some issues (audio is not perfect), but I’ll play around with a few more things to try get it working better!

To answer your questions:

I assume the ‘refs/pr/9’ version will replace the broken 8 bit model? So in future the ‘refs/pr/9’ addition is not needed?

That’s right 😃 I’ll do some more exploration of the effect different quantization settings have on the output (as well as trying out different settings for each sub-model (text-encoder, musicgen-decoder, encodec-decoder).

Currently the audio array is found at audio_values.ort_tensor.cpuData. Is this path something you intend to abstract away? So it’s more in line with other pipelines that output audio?

You should be able to do audio_values.data to get it. When this works with the text-to-audio pipeline, the API will be much easier to interact with, including being able to save and play the audio with .save() and .play(), thanks to https://github.com/xenova/transformers.js/pull/682.

Is there a way to get progress updates during the generation?

We’re planning on updating the API to include support for a Streamer (docs in transformers), which will run a function whenever a new token is generated. Stay tuned 😃

Could you explain a bit what the parameters like ‘do_sample’ do? Then I can add that info to the demo.

This enables sampling the predicted probability distribution to produce the next token. If set to false, the model will generated “greedily” (choosing the most probable token at each step). do_sample=true means the model can generate different songs each generation. For musicgen, it’s highly encouraged to keep this set to true, otherwise the model can get “stuck” and produce noise. See here for more information.

Once that is clear, and if you’re OK with it, I’d like to share an update on Reddit LocalLlama so people can try it.

Absolutely! Go for it 😃

If you’d like (read: if it would save you time) I could rework the online example on Github to become a demo for Transformers.js.

I think a demo will be good once we’ve got WebGPU support working (and would make everything significantly faster), so stay tuned for that!

That did it!

IT WORKS!

Oh thanks, I know what the issue is and I’ll fix it tomorrow!

Thanks for the tip. I tried it, but with guidancescale set to null I unfortunately still get the error. 168274888. I should have gone for the 32GB Macbook…

The code in the v3 thread was only tested with Node.js, as shown by the use of fs

I started to suspect as much.

I will update you when it does work!

Rock’n. My code it now ready 😃

Hi there! This is due to an out-of-memory error, which is primarily due to the fact that you’re loading in full-precision (fp32). The code in the v3 thread was only tested with Node.js, as shown by the use of fs. Fortunately, we’re almost done with the WebGPU implementation, which will work in the browser with fp16 quantization (possible even lower).

I will update you when it does work!

One think you could try is to set guidance_scale to null, as specifying a value > 1 will increase the batch size (+ memory) 2x.