transformers: Getting ValueError: model.shared.weight doesn't have any device set in running a M2M100's-12B model on colab while using with accelerate

System Info

I am getting following error while using accelerate for M2M100 on google colab pro. Following is the code snippet:

import torch

device=torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’) from transformers import AutoConfig, M2M100ForConditionalGeneration, M2M100Tokenizer, AutoModel

from accelerate import infer_auto_device_map, init_empty_weights

from transformers import AutoModel, M2M100Config

config = M2M100Config.from_pretrained(“facebook/m2m100-12B-last-ckpt”)

with init_empty_weights(): model = AutoModel.from_config(config)

device_map = infer_auto_device_map(model, no_split_module_classes=[“M2M100Attention”])

checkpoint = “facebook/m2m100-12B-last-ckpt”

device_map[“shared”] = “cpu” device_map[“encoder”] = “cpu” device_map[“decoder.embed_tokens”] = “cpu” device_map[“decoder.embed_positions”] = “cpu” device_map[“decoder.layers.0”] = “cpu” device_map[“decoder.layers.1”] = “cpu” device_map[“decoder.layers.2”] = “cpu” device_map[“decoder.layers.3”] = “cpu”

model = M2M100ForConditionalGeneration.from_pretrained(checkpoint, device_map=device_map, offload_folder=“offload”, offload_state_dict = True)

Following are the env specs: Model Link: https://huggingface.co/facebook/m2m100-12B-last-ckpt Python Version: 3.10 GPU: A100 GPU: 40GB RAM: 83.5 GB CUDA version: 12.0

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

import torch

device=torch.device(‘cuda’ if torch.cuda.is_available() else ‘cpu’) from transformers import AutoConfig, M2M100ForConditionalGeneration, M2M100Tokenizer, AutoModel

from accelerate import infer_auto_device_map, init_empty_weights

from transformers import AutoModel, M2M100Config

config = M2M100Config.from_pretrained(“facebook/m2m100-12B-last-ckpt”)

with init_empty_weights(): model = AutoModel.from_config(config)

device_map = infer_auto_device_map(model, no_split_module_classes=[“M2M100Attention”])

checkpoint = “facebook/m2m100-12B-last-ckpt”

model = M2M100ForConditionalGeneration.from_pretrained(checkpoint, device_map=device_map, offload_folder=“offload”, offload_state_dict = True)

Expected behavior

Expecting the model to load properly and after the following code is to be used for translation:

hi_text=‘’‘La vie est comme une boîte de chocolat.’‘’

tokenizer = M2M100Tokenizer.from_pretrained(“facebook/m2m100-12B-last-ckpt”)

encoded_hi = tokenizer(hi_text, return_tensors=“pt”).to(‘cuda’)

generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.get_lang_id(“en”)) HF_error

print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0])

About this issue

Original URL
State: closed
Created a year ago
Comments: 16 (3 by maintainers)

Most upvoted comments

Hi @anujsahani01 Can you try to put GPTBigCodeBlock in no split modules?

Yes it worked. Thank You!

anujsahani01 on May 31, 2023

Hi @anujsahani01 Can you try to put GPTBigCodeBlock in no split modules?

younesbelkada on May 31, 2023

Hmm this sounds more like you are using the infer auto device map in an inappropriate way indeed. You should put "M2M100EncoderLayer" and "M2M100DecoderLayer" inside _no_split_modules. Could you try again with these new values? Also can you share us a handy reproducible snippet? 🙏

younesbelkada on May 25, 2023