accelerate: Error loading/training LlamaForSequenceClassification with map_device="auto", load_in_8bit=True and fp16=True

System Info

- `Accelerate` version: 0.18.0.dev0
- Platform: Linux-5.15.0-1033-aws-x86_64-with-glibc2.31
- Python version: 3.11.3
- Numpy version: 1.24.2
- PyTorch version (GPU?): 2.0.0+cu117 (True)
- `Accelerate` default config:
	Not found

Information

The official example scripts
My own modified scripts

Tasks

One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
My own task or dataset (give details below)

Reproduction

Im doing some research using the llama model as casual lm without any problems, but when i try to load the model as SequenceClassification can’t load it using the accelerate tools.

import torch
from peft import PeftConfig, PeftModel
import bitsandbytes as bnb
from accelerate import init_empty_weights, load_checkpoint_and_dispatch, load_checkpoint_in_model, infer_auto_device_map
from transformers import (
    AutoConfig,
    pipeline,
    TrainingArguments,
    Trainer,    
    DataCollatorWithPadding,
    LlamaTokenizer,
    LlamaForSequenceClassification,
    LlamaConfig,
    AutoModelForSequenceClassification,
    AutoModelForCausalLM
)
from pathlib import Path
model = AutoModelForSequenceClassification.from_pretrained(
                                                       pretrained_model_name_or_path=model_path,
                                                       id2label=id2label,
                                                       label2id=label2id,
                                                       load_in_8bit=True,
                                                       device_map="auto",
                                                       torch_dtype=torch.float16,
                                                      )

Crashes with the following trace:

Traceback (most recent call last):
  File "/home/ubuntu/LLaMA/training_test.py", line 30, in <module>
    model = AutoModelForSequenceClassification.from_pretrained(pretrained_model_name_or_path=model_path,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/LLaMA/transformers/src/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/LLaMA/transformers/src/transformers/modeling_utils.py", line 2846, in from_pretrained
    dispatch_model(model, device_map=device_map, offload_dir=offload_folder, offload_index=offload_index, offload_buffers=True)
  File "/home/ubuntu/LLaMA/accelerate/src/accelerate/big_modeling.py", line 370, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "/home/ubuntu/LLaMA/accelerate/src/accelerate/hooks.py", line 478, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "/home/ubuntu/LLaMA/accelerate/src/accelerate/hooks.py", line 155, in add_hook_to_module
    module = hook.init_hook(module)
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ubuntu/LLaMA/accelerate/src/accelerate/hooks.py", line 251, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device)
  File "/home/ubuntu/LLaMA/accelerate/src/accelerate/utils/modeling.py", line 136, in set_module_tensor_to_device
    raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")
ValueError: weight is on the meta device, we need a `value` to put in on 0.

I read some issues about this error, but anything works. After some debugging, i think, the loading process fails trying to load the score layer due to the method set_module_tensor_to_device receives the nn.Linear module without any weights. Im not sure at all if that is the error.

I can load & train the model as follows:

model = AutoModelForSequenceClassification.from_pretrained(
                                                       pretrained_model_name_or_path=model_path,
                                                       id2label=id2label,
                                                       label2id=label2id,
                                                       torch_dtype=torch.float16,
                                                      ).to("cuda")

The problem is that i can train it in a single gpu with a large dataset, and when i try to use an instance with multi-gpu the trainer loads the whole model in each gpu and get an OOM error.

Probably i’m missing something…

Expected behavior

Load and train the model without any issues.

About this issue

Original URL
State: closed
Created a year ago
Reactions: 1
Comments: 20

Most upvoted comments

Yeah, now everything seems to work. I just started a training with device_map=“auto” and the model loads correctly on multiple gpus. 🎉

What was the problem? I don’t know much about meta devices.

luisan06 on Apr 18, 2023

On my side, it’s fixed with the PR above.

sgugger on Apr 18, 2023