transformers: to is not supported for `8-bit` models

System Info

Hi,

I am using a Llama model and wanted to add to pipeline class but it throws me an error when building the pipeline class. Does anyone have a solution to this? thank you!

@Narsil

Who can help?

@Narsil

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Model

model = AutoModelForCausalLM.from_pretrained( model_name, device_map=‘auto’, load_in_8bit=True, max_memory=max_memory)

llm class

class CustomLLM(LLM):

pipeline = pipeline("text-generation",tokenizer = tokenizer, model=model, device="cuda:0")

def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
    prompt_length = len(prompt)
    response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]

    # only return newly generated tokens
    return response[prompt_length:]

@property
def _identifying_params(self) -> Mapping[str, Any]:
    return {"name_of_model": self.model_name}

@property
def _llm_type(self) -> str:
    return "custom"

        " model has already been set to the correct devices and casted to the correct `dtype`."

Expected behavior

1879 # Checks if the model has been loaded in 8-bit 1880 if getattr(self, “is_loaded_in_8bit”, False): -> 1881 raise ValueError( 1882 “.to is not supported for 8-bit models. Please use the model as it is, since the” 1883 " model has already been set to the correct devices and casted to the correct dtype."

About this issue

Original URL
State: closed
Created a year ago
Comments: 35 (11 by maintainers)

Most upvoted comments

still suffering this issue with accelerate 0.20.3 and transformers 4.30.2, getting " ValueError: .to is not supported for 4-bit or 8-bit models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype. "

xiyuanyang45 on Jun 29, 2023

for everyone stumbling into this error, my solution was to use accelerate 0.20.3 and transformers 4.30.2 (not necceserally needed). With those versions the training started correctly.

Nazzaroth2 on Jun 28, 2023

How did you solve this issue?

Swami-Abhinav on Jan 17, 2024

Hi @deepaksen1996 Thanks for the reproducer. I managed to reproduce it. Note that device_map="auto" will automatically dispatch the model into the correct device(s), hence there is no need to add device argument into the model_kwargs. The script below works fine for me:

from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer, pipeline
import torch

model_path="facebook/opt-350m"

config = AutoConfig.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_8bit=True, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(model_path)
params = {
        "max_length":1024,
        "pad_token_id": 0, 
        "device_map":"auto", 
        "load_in_8bit": True,
        }
pipe = pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    model_kwargs=params,
)

pipe("Hello")

younesbelkada on Jul 31, 2023

add that i’m using the bnb_4bit, as follows

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

xiyuanyang45 on Jun 29, 2023

Okay Sure @younesbelkada , i will re run and update if issue still comes.

22Mukesh22 on Jun 13, 2023

Hi @younesbelkada ,

Thank you for your answer, I was using version 4.29 but I will try a newer version. have a good day

lborcard on May 15, 2023