peft: error merge_and_unload for adapter with a prefix

System Info

peft version: 0.9.0 transforemrs version: 4.37.2

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder
  • My own task or dataset (give details below)

Reproduction

I have an adapter model which weights have a prefix (base_model.model),here’s my merge code:

from peft import AutoPeftModelForCausalLM, AutoPeftModel
import sys

path_to_adapter = sys.argv[1]
new_model_directory = sys.argv[2]

model = AutoPeftModelForCausalLM.from_pretrained(
    path_to_adapter, # path to the output directory
    device_map="cpu",
    trust_remote_code=True
).eval()

merged_model = model.merge_and_unload()
# max_shard_size and safe serialization are not necessary. 
# They respectively work for sharding checkpoint and save the model to safetensors
merged_model.save_pretrained(new_model_directory, safe_serialization=False)

After run it, I found that the saved model’s weights is same with the base model. I assume that this may cased by my adapter’s weights have a prefix and not merged correctly.

Expected behavior

How to correctly merge and save such adapter

About this issue

  • Original URL
  • State: open
  • Created 3 months ago
  • Comments: 22 (11 by maintainers)

Most upvoted comments

Thanks for sharing. This looks correct so far: When saving the adapter with PEFT, the adapter name is being removed from the key, so e.g. when the adapter name is "default" (which is the default), foo.layers.0.self_attn.q_proj.lora_A.default.weight would become foo.layers.0.self_attn.q_proj.lora_A.weight. I’m not 100% sure why it’s removed – probably it’s so that we can load the adapter with a different adapter name later, but whatever the reason, that’s what happens. In the key names you showed, there is no adapter name, so this is correct.

Later, when we load the adapter, we have to inject the adapter name back into the key, which is happening in the code snippet cited above. Looking through the code, I don’t see what could go wrong for the adapter name to be injected twice, so that we end up with base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.default.weight. I thought that maybe the adapter name was not properly removed in the stored adapter file, but as you’ve shown, that’s not the case. Ideally, if you can somehow create a dummy adapter that causes this issue, without any weights trained on your data, and share it as a safetensors file, I could do further debugging.

I think that I should re-train base model with LoRA config, re-convert lora adapter to safetensors, and re-load adapter and re-merge it with base model.

If that’s not too much effort for you, this could certainly be a solution. I would certainly start with very little data and ensure that this time around, loading the model works, before spending too much time training.

Alternatively, what you could try to do is to modify the PEFT code a little bit so that the double adapter name is removed. So e.g. in this line, add the following snippet:

peft_model_state_dict = {k.replace("default.default", "default"): v for k, v in peft_model_state_dict.items()}

It’s very blunt, but it would be interesting to see if it solves the problem.