peft: error merge_and_unload for adapter with a prefix
System Info
peft version: 0.9.0 transforemrs version: 4.37.2
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder - My own task or dataset (give details below)
Reproduction
I have an adapter model which weights have a prefix (base_model.model),here’s my merge code:
from peft import AutoPeftModelForCausalLM, AutoPeftModel
import sys
path_to_adapter = sys.argv[1]
new_model_directory = sys.argv[2]
model = AutoPeftModelForCausalLM.from_pretrained(
path_to_adapter, # path to the output directory
device_map="cpu",
trust_remote_code=True
).eval()
merged_model = model.merge_and_unload()
# max_shard_size and safe serialization are not necessary.
# They respectively work for sharding checkpoint and save the model to safetensors
merged_model.save_pretrained(new_model_directory, safe_serialization=False)
After run it, I found that the saved model’s weights is same with the base model. I assume that this may cased by my adapter’s weights have a prefix and not merged correctly.
Expected behavior
How to correctly merge and save such adapter
About this issue
- Original URL
- State: open
- Created 3 months ago
- Comments: 22 (11 by maintainers)
Thanks for sharing. This looks correct so far: When saving the adapter with PEFT, the adapter name is being removed from the key, so e.g. when the adapter name is
"default"
(which is the default),foo.layers.0.self_attn.q_proj.lora_A.default.weight
would becomefoo.layers.0.self_attn.q_proj.lora_A.weight
. I’m not 100% sure why it’s removed – probably it’s so that we can load the adapter with a different adapter name later, but whatever the reason, that’s what happens. In the key names you showed, there is no adapter name, so this is correct.Later, when we load the adapter, we have to inject the adapter name back into the key, which is happening in the code snippet cited above. Looking through the code, I don’t see what could go wrong for the adapter name to be injected twice, so that we end up with
base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.default.weight
. I thought that maybe the adapter name was not properly removed in the stored adapter file, but as you’ve shown, that’s not the case. Ideally, if you can somehow create a dummy adapter that causes this issue, without any weights trained on your data, and share it as a safetensors file, I could do further debugging.If that’s not too much effort for you, this could certainly be a solution. I would certainly start with very little data and ensure that this time around, loading the model works, before spending too much time training.
Alternatively, what you could try to do is to modify the PEFT code a little bit so that the double adapter name is removed. So e.g. in this line, add the following snippet:
It’s very blunt, but it would be interesting to see if it solves the problem.