transformers: Getting ValueError: model.shared.weight doesn't have any device set in running a M2M100's-12B model on colab while using with accelerate
System Info
I am getting following error while using accelerate for M2M100 on google colab pro. Following is the code snippet:
import torch
device=torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’) from transformers import AutoConfig, M2M100ForConditionalGeneration, M2M100Tokenizer, AutoModel
from accelerate import infer_auto_device_map, init_empty_weights
from transformers import AutoModel, M2M100Config
config = M2M100Config.from_pretrained(“facebook/m2m100-12B-last-ckpt”)
with init_empty_weights(): model = AutoModel.from_config(config)
device_map = infer_auto_device_map(model, no_split_module_classes=[“M2M100Attention”])
checkpoint = “facebook/m2m100-12B-last-ckpt”
device_map[“shared”] = “cpu” device_map[“encoder”] = “cpu” device_map[“decoder.embed_tokens”] = “cpu” device_map[“decoder.embed_positions”] = “cpu” device_map[“decoder.layers.0”] = “cpu” device_map[“decoder.layers.1”] = “cpu” device_map[“decoder.layers.2”] = “cpu” device_map[“decoder.layers.3”] = “cpu”
model = M2M100ForConditionalGeneration.from_pretrained(checkpoint, device_map=device_map, offload_folder=“offload”, offload_state_dict = True)
Following are the env specs: Model Link: https://huggingface.co/facebook/m2m100-12B-last-ckpt Python Version: 3.10 GPU: A100 GPU: 40GB RAM: 83.5 GB CUDA version: 12.0
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
import torch
device=torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’) from transformers import AutoConfig, M2M100ForConditionalGeneration, M2M100Tokenizer, AutoModel
from accelerate import infer_auto_device_map, init_empty_weights
from transformers import AutoModel, M2M100Config
config = M2M100Config.from_pretrained(“facebook/m2m100-12B-last-ckpt”)
with init_empty_weights(): model = AutoModel.from_config(config)
device_map = infer_auto_device_map(model, no_split_module_classes=[“M2M100Attention”])
checkpoint = “facebook/m2m100-12B-last-ckpt”
device_map[“shared”] = “cpu” device_map[“encoder”] = “cpu” device_map[“decoder.embed_tokens”] = “cpu” device_map[“decoder.embed_positions”] = “cpu” device_map[“decoder.layers.0”] = “cpu” device_map[“decoder.layers.1”] = “cpu” device_map[“decoder.layers.2”] = “cpu” device_map[“decoder.layers.3”] = “cpu”
model = M2M100ForConditionalGeneration.from_pretrained(checkpoint, device_map=device_map, offload_folder=“offload”, offload_state_dict = True)
Expected behavior
Expecting the model to load properly and after the following code is to be used for translation:
hi_text=‘’‘La vie est comme une boîte de chocolat.’‘’
tokenizer = M2M100Tokenizer.from_pretrained(“facebook/m2m100-12B-last-ckpt”)
encoded_hi = tokenizer(hi_text, return_tensors=“pt”).to(‘cuda’)
generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.get_lang_id(“en”))
print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0])
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16 (3 by maintainers)
Yes it worked. Thank You!
Hi @anujsahani01 Can you try to put
GPTBigCodeBlock
in no split modules?Hmm this sounds more like you are using the infer auto device map in an inappropriate way indeed. You should put
"M2M100EncoderLayer"
and"M2M100DecoderLayer"
inside_no_split_modules
. Could you try again with these new values? Also can you share us a handy reproducible snippet? 🙏