peft: modules_to_save: "ValueError: Attempting to unscale FP16 gradients"
I’m trying to finetune llama with some expanded tokens using resize_token_embeddings()
and passing modules_to_save=['embed_tokens', 'lm_head']
, but it seems there is some misconfiguration
Traceback (most recent call last):
File "/home/jonathanasdf/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1662, in train
return inner_training_loop(
File "/home/jonathanasdf/.local/lib/python3.10/site-packages/transformers/trainer.py", line 1962, in _inner_training_loop
self.scaler.unscale_(self.optimizer)
File "/home/jonathanasdf/.local/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 284, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(optimizer, inv_scale, found_inf, False)
File "/home/jonathanasdf/.local/lib/python3.10/site-packages/torch/cuda/amp/grad_scaler.py", line 212, in _unscale_grads_
raise ValueError("Attempting to unscale FP16 gradients.")
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 22 (5 by maintainers)
Commits related to this issue
- DOC Troubleshooting for unscaling error with fp16 Some users ran into the issue of trying to use a model loaded in float16 with mixed precision, e.g. these issues: #341, #1249. This PR documents a wo... — committed to BenjaminBossan/peft by BenjaminBossan 6 months ago
- DOC Troubleshooting for unscaling error with fp16 (#1336) Some users ran into the issue of trying to use a model loaded in float16 with mixed precision, e.g. these issues: #341, #1249. This PR docum... — committed to huggingface/peft by BenjaminBossan 6 months ago
New idea: Now the training finally works. Setting fp16=False would make the training be super slow and not mem-friendly.
To avoid “ValueError: Attempting to unscale FP16 gradients”, just make sure each trainable params to be in type ‘torch.float32’. In my case just:
It seems like a bug from pytorch side.
so clever, dude. thanks for your idea
This is my use case test: Break with
raise ValueError("Attempting to unscale FP16 gradients.")
under below configs.No error for below cases
I am confused about how to understand the relation between, torch_dtype, fp16, modules_to_save?
Thanks for providing an example. I tried it (using opt) and it crashed even with
modules_to_save=None
. Checking the dtypes of the learnable parameters, they are fp16, so the crash is expected. Not sure what the source of the difference is, but either way, I think it’s safe to say that when loading in fp16, it’s best to cast the trainable weights to fp32. PR #1318 will introduce a convenience functioncast_non_trainable_to_dtype
to do this quickly.Which one exactly do you mean? Note that for the case of loading the model in float16, you have to follow the advice given above.
A snippet that should work a little bit more generally: