transformers: Issue Loading 4-bit and 8-bit language models: ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

System Info

I’m running into an issue where I’m not able to load a 4-bit or 8-bit quantized version of Falcon or LLaMa models. This was working a couple of weeks ago. This is running on Colab. I’m wondering if anyone knows of a fix, or why this is no longer working when it was 2-3 weeks ago around June 8th.

transformers version: 4.31.0.dev0
Platform: Linux-5.15.107±x86_64-with-glibc2.31
Python version: 3.10.12
Huggingface_hub version: 0.15.1
Safetensors version: 0.3.1
PyTorch version (GPU?): 2.0.1+cu118 (True)
Tensorflow version (GPU?): 2.12.0 (True)
Flax version (CPU?/GPU?/TPU?): 0.6.11 (gpu)
Jax version: 0.4.10
JaxLib version: 0.4.10

Who can help?

@ArthurZucker @younesbelkada @sgugger

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Running in Colab on an A100 in Colab PRro



!pip install git+https://www.github.com/huggingface/transformers

!pip install git+https://github.com/huggingface/accelerate

!pip install bitsandbytes

!pip install einops

from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer
import torch

model_path="tiiuae/falcon-40b-instruct"

config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")

tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")

input_text = "Describe the solar system."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")

outputs = model.generate(input_ids, max_length=100)
print(tokenizer.decode(outputs[0]))

Cell output:

Collecting git+https://www.github.com/huggingface/transformers
  Cloning https://www.github.com/huggingface/transformers to /tmp/pip-req-build-6pyatvel
  Running command git clone --filter=blob:none --quiet https://www.github.com/huggingface/transformers /tmp/pip-req-build-6pyatvel
  warning: redirecting to https://github.com/huggingface/transformers.git/
  Resolved https://www.github.com/huggingface/transformers to commit e84bf1f734f87aa2bedc41b9b9933d00fc6add98
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (3.12.2)
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers==4.31.0.dev0)
  Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 236.8/236.8 kB 11.6 MB/s eta 0:00:00
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (1.22.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (23.1)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (6.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (2022.10.31)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (2.27.1)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.31.0.dev0)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 114.2 MB/s eta 0:00:00
Collecting safetensors>=0.3.1 (from transformers==4.31.0.dev0)
  Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 79.9 MB/s eta 0:00:00
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (4.65.0)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.31.0.dev0) (2023.6.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.31.0.dev0) (4.6.3)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (2023.5.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (3.4)
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... done
  Created wheel for transformers: filename=transformers-4.31.0.dev0-py3-none-any.whl size=7228417 sha256=5867afa880111a40f7b630e51d9f1709ec1131236a31c2c7fb5f97179e3d1405
  Stored in directory: /tmp/pip-ephem-wheel-cache-t06u3u6x/wheels/c1/ac/11/e69d454307e735e14f4f95e575c8be27fd99835ec36f504c13
Successfully built transformers
Installing collected packages: tokenizers, safetensors, huggingface-hub, transformers
Successfully installed huggingface-hub-0.15.1 safetensors-0.3.1 tokenizers-0.13.3 transformers-4.31.0.dev0
Collecting git+https://github.com/huggingface/accelerate
  Cloning https://github.com/huggingface/accelerate to /tmp/pip-req-build-76ziff6x
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/accelerate /tmp/pip-req-build-76ziff6x
  Resolved https://github.com/huggingface/accelerate to commit d141b4ce794227450a105b7281611c7980e5b3d6
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (1.22.4)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (23.1)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (5.9.5)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (6.0)
Requirement already satisfied: torch>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (2.0.1+cu118)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.12.2)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (4.6.3)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (1.11.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.1.2)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (2.0.0)
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.6.0->accelerate==0.21.0.dev0) (3.25.2)
Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.6.0->accelerate==0.21.0.dev0) (16.0.6)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.6.0->accelerate==0.21.0.dev0) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.6.0->accelerate==0.21.0.dev0) (1.3.0)
Building wheels for collected packages: accelerate
  Building wheel for accelerate (pyproject.toml) ... done
  Created wheel for accelerate: filename=accelerate-0.21.0.dev0-py3-none-any.whl size=234648 sha256=71b98a6d4b1111cc9ca22265f6699cd552325e5f71c83daebe696afd957497ee
  Stored in directory: /tmp/pip-ephem-wheel-cache-atmtszgr/wheels/f6/c7/9d/1b8a5ca8353d9307733bc719107acb67acdc95063bba749f26
Successfully built accelerate
Installing collected packages: accelerate
Successfully installed accelerate-0.21.0.dev0
Collecting bitsandbytes
  Downloading bitsandbytes-0.39.1-py3-none-any.whl (97.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.1/97.1 MB 18.8 MB/s eta 0:00:00
Installing collected packages: bitsandbytes
Successfully installed bitsandbytes-0.39.1
Collecting einops
  Downloading einops-0.6.1-py3-none-any.whl (42 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.2/42.2 kB 3.8 MB/s eta 0:00:00
Installing collected packages: einops
Successfully installed einops-0.6.1
Downloading (…)lve/main/config.json: 100%
658/658 [00:00<00:00, 51.8kB/s]
Downloading (…)/configuration_RW.py: 100%
2.51k/2.51k [00:00<00:00, 227kB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b-instruct:
- configuration_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)main/modelling_RW.py: 100%
47.1k/47.1k [00:00<00:00, 3.76MB/s]
A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b-instruct:
- modelling_RW.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Downloading (…)model.bin.index.json: 100%
39.3k/39.3k [00:00<00:00, 3.46MB/s]
Downloading shards: 100%
9/9 [04:40<00:00, 29.33s/it]
Downloading (…)l-00001-of-00009.bin: 100%
9.50G/9.50G [00:37<00:00, 274MB/s]
Downloading (…)l-00002-of-00009.bin: 100%
9.51G/9.51G [00:33<00:00, 340MB/s]
Downloading (…)l-00003-of-00009.bin: 100%
9.51G/9.51G [00:28<00:00, 320MB/s]
Downloading (…)l-00004-of-00009.bin: 100%
9.51G/9.51G [00:33<00:00, 317MB/s]
Downloading (…)l-00005-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 210MB/s]
Downloading (…)l-00006-of-00009.bin: 100%
9.51G/9.51G [00:34<00:00, 180MB/s]
Downloading (…)l-00007-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 307MB/s]
Downloading (…)l-00008-of-00009.bin: 100%
9.51G/9.51G [00:27<00:00, 504MB/s]
Downloading (…)l-00009-of-00009.bin: 100%
7.58G/7.58G [00:27<00:00, 315MB/s]

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//172.28.0.1'), PosixPath('8013'), PosixPath('http')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-a100-s-b20acq94qsrp --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true'), PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
  warn(msg)
/usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
Loading checkpoint shards: 100%
9/9 [05:45<00:00, 35.83s/it]
Downloading (…)neration_config.json: 100%
111/111 [00:00<00:00, 10.3kB/s]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-1-c89997e10ae9>](https://localhost:8080/#) in <cell line: 15>()
     13 
     14 config = AutoConfig.from_pretrained(model_path, trust_remote_code=True)
---> 15 model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto")
     16 
     17 tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct")

3 frames
[/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py](https://localhost:8080/#) in to(self, *args, **kwargs)
   1894         # Checks if the model has been loaded in 8-bit
   1895         if getattr(self, "is_quantized", False):
-> 1896             raise ValueError(
   1897                 "`.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the"
   1898                 " model has already been set to the correct devices and casted to the correct `dtype`."

ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

Expected behavior

Model should be loaded and able to run inference.

About this issue

Original URL
State: closed
Created a year ago
Reactions: 1
Comments: 22 (8 by maintainers)

Most upvoted comments

https://github.com/huggingface/accelerate/pull/1652 being merged you can now install accelerate from source and it should work

younesbelkada on Jun 28, 2023

Works like a charm! Thank you very much, @younesbelkada!

Maaalik on Jun 28, 2023

Hi @Maaalik I can confirm the PR mentioned above on accelerate fixes your issue on GColab, can you try on a new runtime / fresh environment:

!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git@fix-to-int8
!pip install -q datasets
!pip install -q einops

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "ybelkada/falcon-7b-sharded-bf16"
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, trust_remote_code=True, device_map={"":0})

I just tested it on GColab

younesbelkada on Jun 28, 2023

Which version of accelerate and transformer fix this issue? I am using transformers==4.36.2 and accelerate==0.26.1, and I am still having this error @younesbelkada. The issue still exists if I use transformers==4.38.0 and accelerate==0.27.2.

The stacktrace is

2024-02-23 11:26:16,461 ERROR tune_controller.py:1374 -- Trial task failed for trial TorchTrainer_22c6b_00000
Traceback (most recent call last):
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/air/execution/_internal/event_manager.py", line 110, in resolve_future
    result = ray.get(future)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/_private/worker.py", line 2624, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ValueError): ray::_Inner.train() (pid=3455, ip=10.68.12.214, actor_id=3ae8e14d20959e0bb1e7fd5c0c000000, repr=TorchTrainer)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 342, in train
    raise skipped from exception_cause(skipped)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/utils.py", line 43, in check_for_failure
    ray.get(object_ref)
ray.exceptions.RayTaskError(ValueError): ray::_RayTrainWorker__execute.get_next() (pid=6302, ip=10.68.23.157, actor_id=77895994777a3819337ced8b0c000000, repr=<ray.train._internal.worker_group.RayTrainWorker object at 0x7f83c6ae1720>)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/worker_group.py", line 33, in __execute
    raise skipped from exception_cause(skipped)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/ray/train/_internal/utils.py", line 118, in discard_return_wrapper
    train_func(*args, **kwargs)
  File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/working_dir_files/_ray_pkg_0639d2a5677b0f10/llama_ray_2.9.py", line 138, in train_func
    model, optimizer, _, lr_scheduler = deepspeed.initialize(
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/__init__.py", line 176, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 262, in __init__
    self._configure_distributed_model(model)
  File "/opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1113, in _configure_distributed_model
    self.module.to(self.device)
  File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/pip/7ca0a277e147900f193d749730594f67ff7cd52d/virtualenv/lib/python3.10/site-packages/accelerate/big_modeling.py", line 448, in wrapper
    return fn(*args, **kwargs)
  File "/tmp/ray/session_2024-02-23_10-14-31_321538_1/runtime_resources/pip/7ca0a277e147900f193d749730594f67ff7cd52d/virtualenv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2534, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

robinsonmhj on Feb 23, 2024

I do not have the answer, no need to tag me.

sgugger on Aug 9, 2023

I went for

!pip install git+https://github.com/huggingface/transformers.git@6ce6d62b6f20040129ec9831e7c4f6576402ea42 !pip install git+https://github.com/huggingface/accelerate.git@5791d949ff93733c102461ba89c8310745a3fa79 !pip install git+https://github.com/huggingface/peft.git@e2b8e3260d3eeb736edf21a2424e89fe3ecf429d !pip install transformers[deepspeed] I had to include transformers[deepspeed] yesterday, and earlier today I had to cherrypick commits to make things work…

Development is going so fast, hard to keep up with every change 😅

buzzCraft on Jun 28, 2023

@younesbelkada

I’ll have an attempt at running things again with that.

DJT777 on Jun 28, 2023