accelerate: Training with Accelerator Fails. RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:7! (when checking argument for argument index in method wrapper__index_select)
I am trying to train a BLOOM-3B model on a setup with 8 GPUS of 20GB each.
The training code is similar to the tutorial here: Distributed training with Accelerate. There is no “main” function used in my code.
The model is loaded with the device map “balanced_low_0”
if get_world_size() > 1:
kwargs["device_map"] = "balanced_low_0"
model = AutoModelForCausalLM.from_pretrained(model_name, **kwargs)
Some of the layers are frozen using param.requires_grad = False
The accelerate config file I’m using is has the following parameters:
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
gpu_ids : 0,1,2,3,4,5,6,7
downcast_bf16: 'no'
machine_rank: 0
main_process_ip: null
main_process_port: null
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 2
use_cpu: false
On launching the code with accelerate and the above config I get the following error:
File "/data/rg_data/pct_mai/Users/Anandamoy/anaconda3/envs/mqa_new/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
output = old_forward(*args, **kwargs)
File "/data/rg_data/pct_mai/Users/Anandamoy/anaconda3/envs/mqa_new/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 158, in forward
return F.embedding(return F.embedding(
File "/data/rg_data/pct_mai/Users/Anandamoy/anaconda3/envs/mqa_new/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding
File "/data/rg_data/pct_mai/Users/Anandamoy/anaconda3/envs/mqa_new/lib/python3.8/site-packages/torch/nn/functional.py", line 2199, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:7! (when checking argument for argument index in method wrapper__index_select)
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:7! (when checking argument for argument index in method wrapper__index_select)
I have tried with both accelerator version 0.15.0 and 0.16.0 and the problem persists. Please help me understand what am I missing?
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16
@ananda1996ai First note that you cannot use data parallel in conjunction with model parallelism, so num_processes in your config needs to be 1. I cannot reproduce the error, could you copy and paste here the result of
model._hf_device_mapso we can have debug more? Note that for trainingdevice_map="balanced"is more recommended thandevice_map="balanced_low_0".Could you also try the just released v0.17.0 to make sure your bug has not been already fixed?