transformers: to is not supported for `8-bit` models

System Info

Hi,

I am using a Llama model and wanted to add to pipeline class but it throws me an error when building the pipeline class. Does anyone have a solution to this? thank you!

@Narsil

Who can help?

@Narsil

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

Model

model = AutoModelForCausalLM.from_pretrained( model_name, device_map=‘auto’, load_in_8bit=True, max_memory=max_memory)

llm class

class CustomLLM(LLM):

pipeline = pipeline("text-generation",tokenizer = tokenizer, model=model, device="cuda:0")

def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
    prompt_length = len(prompt)
    response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]

    # only return newly generated tokens
    return response[prompt_length:]

@property
def _identifying_params(self) -> Mapping[str, Any]:
    return {"name_of_model": self.model_name}

@property
def _llm_type(self) -> str:
    return "custom"
        " model has already been set to the correct devices and casted to the correct `dtype`."

Expected behavior

1879 # Checks if the model has been loaded in 8-bit 1880 if getattr(self, “is_loaded_in_8bit”, False): -> 1881 raise ValueError( 1882 “.to is not supported for 8-bit models. Please use the model as it is, since the” 1883 " model has already been set to the correct devices and casted to the correct dtype."

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 35 (11 by maintainers)

Most upvoted comments

still suffering this issue with accelerate 0.20.3 and transformers 4.30.2, getting " ValueError: .to is not supported for 4-bit or 8-bit models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype. "

for everyone stumbling into this error, my solution was to use accelerate 0.20.3 and transformers 4.30.2 (not necceserally needed). With those versions the training started correctly.

How did you solve this issue?

Hi @deepaksen1996 Thanks for the reproducer. I managed to reproduce it. Note that device_map="auto" will automatically dispatch the model into the correct device(s), hence there is no need to add device argument into the model_kwargs. The script below works fine for me:

from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer, pipeline
import torch

model_path="facebook/opt-350m"

config = AutoConfig.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_8bit=True, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(model_path)
params = {
        "max_length":1024,
        "pad_token_id": 0, 
        "device_map":"auto", 
        "load_in_8bit": True,
        }
pipe = pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    model_kwargs=params,
)

pipe("Hello")

add that i’m using the bnb_4bit, as follows

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

Okay Sure @younesbelkada , i will re run and update if issue still comes.

Hi @younesbelkada ,

Thank you for your answer, I was using version 4.29 but I will try a newer version. have a good day