peft: `get_base_model()` is returning the base model with the LoRA still applied.

Technically, I’m just grabbing the .base_model.model directly, rather than using get_base_model(), but that should have the same effect, since that’s all get_base_model() does if the active_peft_config is not PromptLearningConfig as seen here.

After loading a llama model with a LoRA, like so:

shared.model = PeftModel.from_pretrained(shared.model, Path(f"{shared.args.lora_dir}/{lora_names[0]}"), **params)

The PeftModel loads fine and everything is working as expected. However, I can not figure out how to get the original model back without a LoRA still being active when I do an inference.

The code I’m using is from here:

shared.model.disable_adapter()
shared.model = shared.model.base_model.model

This gives me the model back as a LlamaForCausalLM, but when I go to inference, the LoRA is still applied. I made a couple of test LoRAs so that there would be no question as to whether the LoRA is still loaded. They can be found here: https://huggingface.co/clayshoaf/AB-Lora-Test

I am digging around right now, and I see this line: if isinstance(module, LoraLayer): from:

    def _set_adapter_layers(self, enabled=True):
        for module in self.model.modules():
            if isinstance(module, LoraLayer):
                module.disable_adapters = False if enabled else True

So I checked in the program and if I load a LoRA and do

[module for module in shared.model.base_model.model.modules() if hasattr(module, "disable_adapters")]

it returns a bunch of modules that are of the type Linear8bitLt (if loaded in 8bit) or Linear4bitLt (if loaded in 4bit).

Would it work to set the modules’ disable_adapters value to false? I don’t want to hack around too much in the code, because I don’t have a deep enough understanding to be sure that I won’t mess something else up in the process.

If that won’t work, is there something else that I should be doing?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 17 (1 by maintainers)

Commits related to this issue

Most upvoted comments

Related, given how baked this architecture is in Peft, I’ve made a request to vllm to try and implement this in a way that you can load multiple LoRA adapters independently while still being able to access the base model.

After reading the code very closely, I don’t think this is going to get fixed. the only way to access the base model without any adapter replacements is to do

with model.disable_adapter():
    # stuff with base model

When loading LoRA adapter weights, it walks through the base model’s modules and swaps out some of them with LoRA replacements. This is why trying to access the base model directly, or even keeping the base model around like in my example, inference always gets the adapters influence. My guess is that this is too big of an architectural change.

My request to the Peft authors however is to be much more explicit that this is how LoRA adapter inference is implemented. I found this behavior very surprising and it’s not explicitly written about anywhere I saw.

Is anyone looking into this? It still feels very unexpected for the base model to be modified when creating a peft model ontop of it.

I don’t think disable_adapter() works… I made an extension for webui and I can switch LORAS fine, but disable_adapter does noting - it doesn’t disable the lora - the lora is still applied, even after calling it I’d look at problem there. I think this issue has to be probably made as a new issue.