transformers: Unable to quantize Meta's new AudioCraft MusicGen model
System Info
- Windows 11 64bit
- Python 3.10.12
- Torch v2.0.1+cu117
- Transformers v4.31.0
- audiocraft v0.0.2
- bitsandbytes v0.41.0
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
Hi, I’m attempting to quantize Meta’s new MusicGen model with bitsandbytes (through the Transformers library) and I’ve run into a bug with the deepcopy function. I’m not familiar with PyTorch’s deepcopy function or why this error may be occurring, but I am able to side-step it with a hack and get a bit further until I reach another error, this time with the Transformers library.
The first error:
>>> from transformers import AutoProcessor, MusicgenForConditionalGeneration
bin C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
>>> processor = AutoProcessor.from_pretrained("facebook/musicgen-small", load_in_8bit=True)
>>> model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small", load_in_8bit=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\transformers\models\musicgen\modeling_musicgen.py", line 1599, in from_pretrained
return super().from_pretrained(pretrained_model_name_or_path, *model_args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\transformers\modeling_utils.py", line 2719, in from_pretrained
modules_to_not_convert = get_keys_to_not_convert(model)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\transformers\utils\bitsandbytes.py", line 257, in get_keys_to_not_convert
tied_model = deepcopy(model) # this has 0 cost since it is done inside `init_empty_weights` context manager`
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 146, in deepcopy
y = copier(x, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 146, in deepcopy
y = copier(x, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 146, in deepcopy
y = copier(x, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 146, in deepcopy
y = copier(x, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 146, in deepcopy
y = copier(x, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 297, in _reconstruct
value = deepcopy(value, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 271, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 146, in deepcopy
y = copier(x, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\copy.py", line 153, in deepcopy
y = copier(memo)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\torch\_tensor.py", line 86, in __deepcopy__
raise RuntimeError(
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment
The hack:
torch.save(model, "temp.pt")
tied_model = torch.load("temp.pt")
The second error after using the hack:
>>> from transformers import AutoProcessor, MusicgenForConditionalGeneration
bin C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\bitsandbytes\libbitsandbytes_cuda117.dll
>>> processor = AutoProcessor.from_pretrained("facebook/musicgen-small", load_in_8bit=True)
>>> model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small", load_in_8bit=True)
>>> inputs = processor(text=["80s pop track with bassy drums and synth"], padding=True, return_tensors="pt")
>>> audio_values = model.generate(**inputs, do_sample=True, guidance_scale=3, max_new_tokens=256)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\transformers\models\musicgen\modeling_musicgen.py", line 2430, in generate
outputs = self.sample(
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\transformers\generation\utils.py", line 2642, in sample
outputs = self(
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\transformers\models\musicgen\modeling_musicgen.py", line 1916, in forward
decoder_outputs = self.decoder(
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\transformers\models\musicgen\modeling_musicgen.py", line 1029, in forward
outputs = self.model(
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\transformers\models\musicgen\modeling_musicgen.py", line 938, in forward
decoder_outputs = self.decoder(
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\transformers\models\musicgen\modeling_musicgen.py", line 848, in forward
layer_outputs = decoder_layer(
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\transformers\models\musicgen\modeling_musicgen.py", line 394, in forward
hidden_states = self.self_attn_layer_norm(hidden_states)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\accelerate\hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\torch\nn\modules\normalization.py", line 190, in forward
return F.layer_norm(
File "C:\Users\fkdlam\anaconda3\envs\audiocraft\lib\site-packages\torch\nn\functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half
This is the same code provided in an example for generating music in the Transformers documentation, except I’ve added the load_in_8bit flag. I’m not sure how to fix this one though. I’ve created an issue in the bitsandbytes repository too.
Expected behavior
Being able to run the MusicGen quantized model with bitsandbytes and obtain audio data output.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 19 (15 by maintainers)
Ok I think I figured it out. MusicGen can’t actually generate more than 30 seconds of audio so the max number of tokens is 1506 and going over that number will generate the weird noise. 502 tokens per 10 seconds. You input the first 20 seconds and then auto-prompt generate the last 10.