parallelformers: AssertionError: Model should be on CPU before parallelization. It is more memory-efficient.
Hello, first of all congratulations for this amazing project. It’s simple, efficient and versatile. Very useful.
In some cases, it happens that one has several GPUs, but not enough RAM to parallelize the model.
When loading the model on GPU, and then parallelizing, I’m getting the below error:
AssertionError: Model should be on CPU before parallelization. It is more memory-efficient.
It doesn’t stop the script, but it seems that the parallelization fails.
My question is: is it possible to load the initial model on GPU instead of CPU (even if it’s not memory-efficient) or not at all?
Thanks!
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 29 (16 by maintainers)
Thanks a lot @hyunwoongko . I will try the above and close this issue. I think my request goes beyond the scope of parallelformers. Thanks again!
It works great! Thanks for the quick addition! 🥇
Thanks again for the great work, that’s very useful.
I updated ! please upgrade library using
pip install parallelformers --upgradehttps://github.com/tunib-ai/parallelformers/releases/tag/v1.2We have discussed several times to solve this problem. here is that discussion. https://github.com/pytorch/pytorch/issues/64327 https://github.com/huggingface/transformers/issues/13548
This issue should be solved on the pytorch side. 😦 not transformers side.
On the other hand, on the deepspeed side, there is a code designed so that the divided model can be uploaded directly to the gpu. (
deepspeed.zero.Init) I don’t know much about the internal implementation, but it would be good to refer to.https://deepspeed.readthedocs.io/en/latest/zero3.html#constructing-massive-models
How about using Transformers’
low_cpu_mem_usageif you run out of cpu memory? Instead, the loading speed is slower. I recommend the following code to you. I think it’s best way for low cpu memory.