accelerate: ValueError: weight is on the meta device, we need a `value` to put in on cpu.
System Info
Windows 10
Accelerate Version: from git (recent)
Python 3.8.0
4GB GPU
16GB RAM
Information
- The official example scripts
- My own modified scripts
Tasks
- One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - My own task or dataset (give details below)
Reproduction
I am using:
BASE_MODEL = “decapoda-research/llama-7b-hf” LORA_WEIGHTS = “tloen/alpaca-lora-7b”
I get this error: ValueError: weight is on the meta device, we need a value to put in on cpu. in modeling.py, function set_module_tensor_to_device:
if old_value.device == torch.device("meta") and device not in ["meta", torch.device("meta")] and value is None:
raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")
More details:
- I am trying to load the model on my GPU 4GB. So I am low on GPU resources and I suppose a lot offloading is performed back and forward between the CPU and the GPU.
- In my code there is model.half(), but this gives me an error
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half', so I disabled this code in primary script - So my code goes further (after not using Half()), but it fails with the error above about the meta device.
The source code is available: here
Might be related: https://github.com/huggingface/accelerate/issues/1197
Expected behavior
No error. The model and the weights are loaded (in both CPU and GPU).
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 22
I installed accelerate yesterday from GIT. I forgot to say. It all works OK in Google Colab. Google Colab has a 16 GB GPU and the model is loaded OK. I use weights not from Meta, but from Alpaca Stanford. It does not work on my laptop with 4GB GPU when I insist on using the GPU. In CPU mode it also works on my laptop, but it takes between 20 and 40 minutes to get an answer to a prompt. So when I insist on using my 4GB GPU it fails somewhere in the process of putting back and forward the model between the GPU and the CPU (the two types of RAM). I do not understand the above error message very well. What is a “meta” device for example?
@philip30 This is because the initalization under
init_empty_weightsbreaks the tied weights. You need to add amodel.tie_weights()to re-tie them afterward.:This is in the documentation
If you are not offloading anything (e.g. the device map only contains GPUs), it works for training as well.