accelerate: ValueError: You can't train a model that has been loaded in 8-bit precision on multiple devices.

@younesbelkada (Thanks again for developing these great libraries and responding on Github!)

With the bleeding edge transformers, I cannot combine PEFT and accelerate to do parameter-efficient fine-tuning with naive pipeline parallelism (i.e., splitting a model loaded on 8-bit across multiple GPUs). Are both PEFT and accelerate not supporting such use cases? The code is working on earlier transformers version so wondering about it.

  File "/home/ec2-user/.local/lib/python3.7/site-packages/transformers/trainer.py", line 1665, in train
    ignore_keys_for_eval=ignore_keys_for_eval,
  File "/home/ec2-user/.local/lib/python3.7/site-packages/transformers/trainer.py", line 1768, in _inner_training_loop
    self.model, self.optimizer, self.lr_scheduler
  File "/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/accelerator.py", line 1144, in prepare
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/accelerator.py", line 1144, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/accelerator.py", line 995, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/accelerator.py", line 1201, in prepare_model
    "You can't train a model that has been loaded in 8-bit precision on multiple devices."
ValueError: You can't train a model that has been loaded in 8-bit precision on multiple devices.

Here is the subset of outcomes from pip3 list regarding the package version:

Package                  Version
------------------------ -----------
accelerate               0.19.0
transformers             4.30.0.dev0
peft                     0.3.0

About this issue

Original URL
State: closed
Created a year ago
Comments: 27 (12 by maintainers)

Most upvoted comments

Thank you so much @younesbelkada! Yes, (at least currently) NOT looking for distributed training (e.g., distributed data parallel through trochrun) when load_in_8bit (or 4bit) is turned on. Only NPP. Looking forward for the https://github.com/huggingface/accelerate/pull/1523 to be merged!

@dylanwwang https://github.com/huggingface/accelerate/pull/1523 should solve your error too 😃

Hi @akkikiki Thanks so much for your kind words and the report I have digged the problem and appears it was my mistake and I forgot to add an extra check. NPP should not be supported under any distributed regime by definitio as NPP paradigm is purely sequential (i.e. should be run just with python xxxx.py) #1523 should hopefully fix the issue Does my explanation make sense? Please let me know if you have any question

akkikiki on Jun 5, 2023

but, even the naive Data Parallel with AllReduce，shouldn’t be like this ?

device_map="auto" is not data parallelism, it’s model parallelism (your model is split across the GPUs). It is not compatible with Data parallelism. If you want to combine data parallelism and model parallelism, you need to use FSDP or DeepSpeed.

sgugger on Jun 9, 2023

@dylanwwang this is odd, I think you have put your entire model in a single GPU, how did you initialized your model? using device_map="auto"?

@younesbelkada model is indeed divided into 4 GPUs，and initialize with device_map=“auto”

dylanwwang on Jun 9, 2023