accelerate: ValueError: weight is on the meta device, we need a `value` to put in on cpu.

System Info

Windows 10
Accelerate Version: from git (recent)
Python 3.8.0
4GB GPU
16GB RAM

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

I am using:

BASE_MODEL = “decapoda-research/llama-7b-hf” LORA_WEIGHTS = “tloen/alpaca-lora-7b”

I get this error: ValueError: weight is on the meta device, we need a value to put in on cpu. in modeling.py, function set_module_tensor_to_device:

    if old_value.device == torch.device("meta") and device not in ["meta", torch.device("meta")] and value is None:
        raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")

More details:

I am trying to load the model on my GPU 4GB. So I am low on GPU resources and I suppose a lot offloading is performed back and forward between the CPU and the GPU.
In my code there is model.half(), but this gives me an error RuntimeError: "addmm_impl_cpu_" not implemented for 'Half', so I disabled this code in primary script
So my code goes further (after not using Half()), but it fails with the error above about the meta device.

The source code is available: here

Expected behavior

No error. The model and the weights are loaded (in both CPU and GPU).

About this issue

Original URL
State: closed
Created a year ago
Comments: 22

Most upvoted comments

I installed accelerate yesterday from GIT. I forgot to say. It all works OK in Google Colab. Google Colab has a 16 GB GPU and the model is loaded OK. I use weights not from Meta, but from Alpaca Stanford. It does not work on my laptop with 4GB GPU when I insist on using the GPU. In CPU mode it also works on my laptop, but it takes between 20 and 40 minutes to get an answer to a prompt. So when I insist on using my 4GB GPU it fails somewhere in the process of putting back and forward the model between the GPU and the CPU (the two types of RAM). I do not understand the above error message very well. What is a “meta” device for example?

toncho11 on Apr 4, 2023

@philip30 This is because the initalization under init_empty_weights breaks the tied weights. You need to add a model.tie_weights() to re-tie them afterward.:

whisper_model = "openai/whisper-tiny"
weights_location = hf_hub_download(whisper_model, 'pytorch_model.bin')
config = AutoConfig.from_pretrained(whisper_model)
with init_empty_weights():
     model = AutoModelWithLMHead.from_config(config)
model.tie_weights()
model = load_checkpoint_and_dispatch(model, weights_location, device_map='auto')

This is in the documentation

sgugger on Apr 14, 2023

If you are not offloading anything (e.g. the device map only contains GPUs), it works for training as well.

sgugger on Aug 18, 2023