peft: RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for
Hi, thanks for this awesome library.
I posted an initial query here: https://huggingface.co/databricks/dolly-v2-3b/discussions/19
Reposting below.
I’m fine-tuning dolly-v2-3b. Training is invoked with deepspeed which uses this HF module.
Going by this example, my changes are:
model = prepare_model_for_int8_training(model, use_gradient_checkpointing=gradient_checkpointing)
# The dimension used by the LoRA update matrices
LORA_R = 4
# Scaling factor
LORA_ALPHA = 16
LORA_DROPOUT = 0.05
# r and alpha together control the total number of final trainable parameters when using LoRA, giving you the flexibility to balance a trade-off between end performance and compute efficiency.
config = LoraConfig(
r=LORA_R,
lora_alpha=LORA_ALPHA,
lora_dropout=LORA_DROPOUT,
bias="none", # Specifies if the bias parameters should be trained
task_type="CAUSAL_LM",
# target_modules=["q", "v"], # I tried with/without this line
)
model = get_peft_model(model, config)
It trains successfully, and I end up with a 677kB adapter:
Config looks OK:
from peft import PeftConfig
config = PeftConfig.from_pretrained(repo_name)
Out[19]: PeftConfig(peft_type=‘LORA’, base_model_name_or_path=‘databricks/dolly-v2-3b’, task_type=‘CAUSAL_LM’, inference_mode=True)
But, when I try to use the adapter with the base model, I get an error:
from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
device_map="auto",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
# Load the LoRA model
inference_model = PeftModel.from_pretrained(model, repo_name) # <-- error here
inference_model.eval()
inference_model
Error:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
File <command-3660940350576262>:12
5 model = AutoModelForCausalLM.from_pretrained(
6 config.base_model_name_or_path,
7 device_map="auto",
8 torch_dtype=torch.bfloat16,
9 trust_remote_code=True,
10 )
11 # Load the LoRA model
---> 12 inference_model = PeftModel.from_pretrained(model, repo_name)
13 inference_model.eval()
14 inference_model
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/peft/peft_model.py:181, in PeftModel.from_pretrained(cls, model, model_id, adapter_name, is_trainable, **kwargs)
179 else:
180 model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config, adapter_name)
--> 181 model.load_adapter(model_id, adapter_name, **kwargs)
182 return model
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/peft/peft_model.py:384, in PeftModel.load_adapter(self, model_id, adapter_name, is_trainable, **kwargs)
380 adapters_weights = torch.load(
381 filename, map_location=torch.device("cuda" if torch.cuda.is_available() else "cpu")
382 )
383 # load the weights into the model
--> 384 set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
385 if (
386 (getattr(self, "hf_device_map", None) is not None)
387 and (len(set(self.hf_device_map.values()).intersection({"cpu", "disk"})) > 0)
388 and len(self.peft_config) == 1
389 ):
390 device_map = kwargs.get("device_map", "auto")
File /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/peft/utils/save_and_load.py:123, in set_peft_model_state_dict(model, peft_model_state_dict, adapter_name)
120 else:
121 raise NotImplementedError
--> 123 model.load_state_dict(peft_model_state_dict, strict=False)
124 if isinstance(config, PromptLearningConfig):
125 model.prompt_encoder[adapter_name].embedding.load_state_dict(
126 {"weight": peft_model_state_dict["prompt_embeddings"]}, strict=True
127 )
File /databricks/python/lib/python3.9/site-packages/torch/nn/modules/module.py:1671, in Module.load_state_dict(self, state_dict, strict)
1666 error_msgs.insert(
1667 0, 'Missing key(s) in state_dict: {}. '.format(
1668 ', '.join('"{}"'.format(k) for k in missing_keys)))
1670 if len(error_msgs) > 0:
-> 1671 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
1672 self.__class__.__name__, "\n\t".join(error_msgs)))
1673 return _IncompatibleKeys(missing_keys, unexpected_keys)
And then this is also printed out for layers 0 to 31.
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.gpt_neox.layers.0.attention.query_key_value.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([7680, 4]).
Any pointers would be appreciated!
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 5
- Comments: 21 (1 by maintainers)
Try this, works for me
It uses the trainer to save the full state dict in deepspeeds own format and then deepspeeds utils to gather the weights back on CPU. Then we can store them using torch and pretend all of this never happened 😄
The problem with using just the
peft.save_pretrained
method is that it only stores the weights of a single CUDA device, probably the one corresponding to the first python process. all the other tensors have an empty shape.Edit: The above solution works when
"stage3_gather_16bit_weights_on_model_save": false
. Alternatively set"stage3_gather_16bit_weights_on_model_save": true
in your deepspeed config and that should make the huggingface trainer output a usable state_dict…I’m also facing the same issue. Removing deepspeed is solving the issue. But I need to keep deepspeed to fine-tune larger models. If you find any solution, please share.
The
adapter_model.bin
isn’t available when the deepspeed processes exit. I erroneously pushed to hub from within trainer after training, but as it’s parallelised, I only got a quarter of the full adapterI’ll have to dig into deepspeed and see if I can tell it to collate the adapters, and make it available in the output dir after a run. (Perhaps I can push by rank so I end up with 4x bin files which I can somehow merge after the fact).
But meanwhile, I’ll try the vanilla approach you proposed.
It works elegantly. Thanks a lot!