accelerate: ValueError: You can't train a model that has been loaded in 8-bit precision on multiple devices.
@younesbelkada (Thanks again for developing these great libraries and responding on Github!)
Related issue: https://github.com/huggingface/accelerate/issues/1412
With the bleeding edge transformers, I cannot combine PEFT and accelerate to do parameter-efficient fine-tuning with naive pipeline parallelism (i.e., splitting a model loaded on 8-bit across multiple GPUs).
Are both PEFT and accelerate not supporting such use cases? The code is working on earlier transformers version so wondering about it.
File "/home/ec2-user/.local/lib/python3.7/site-packages/transformers/trainer.py", line 1665, in train
ignore_keys_for_eval=ignore_keys_for_eval,
File "/home/ec2-user/.local/lib/python3.7/site-packages/transformers/trainer.py", line 1768, in _inner_training_loop
self.model, self.optimizer, self.lr_scheduler
File "/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/accelerator.py", line 1144, in prepare
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/accelerator.py", line 1144, in <genexpr>
self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
File "/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/accelerator.py", line 995, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/home/ec2-user/.local/lib/python3.7/site-packages/accelerate/accelerator.py", line 1201, in prepare_model
"You can't train a model that has been loaded in 8-bit precision on multiple devices."
ValueError: You can't train a model that has been loaded in 8-bit precision on multiple devices.
Here is the subset of outcomes from pip3 list regarding the package version:
Package Version
------------------------ -----------
accelerate 0.19.0
transformers 4.30.0.dev0
peft 0.3.0
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 27 (12 by maintainers)
Thank you so much @younesbelkada! Yes, (at least currently) NOT looking for distributed training (e.g., distributed data parallel through
trochrun) whenload_in_8bit(or 4bit) is turned on. Only NPP. Looking forward for the https://github.com/huggingface/accelerate/pull/1523 to be merged!@dylanwwang https://github.com/huggingface/accelerate/pull/1523 should solve your error too 😃
device_map="auto"is not data parallelism, it’s model parallelism (your model is split across the GPUs). It is not compatible with Data parallelism. If you want to combine data parallelism and model parallelism, you need to use FSDP or DeepSpeed.@younesbelkada model is indeed divided into 4 GPUs,and initialize with device_map=“auto”