diffusers: LORA error when running train_text_to_image_lora.py, error Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Describe the bug

I tried to experiment with LoRA training following examples/text_to_image/README.md#training-with-lora.

However, I got the error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm) on line 801.

The same issue did not occur when I was trying the the same example (with the implementation at that time) months ago. I noticed there were several commits after that.

I followed the README.md for installing packages and the non-LoRA training works well.

Thank you very much!

Reproduction

  1. Install packages following README.md:
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install .

Then cd in the folder examples/text_to_image and run

pip install -r requirements.txt
  1. in directory examples/text_to_image run the following
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"
accelerate launch --mixed_precision="fp16" train_text_to_image_lora.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$DATASET_NAME --caption_column="text" \
  --resolution=512 --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=100 --checkpointing_steps=5000 \
  --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --seed=42 \
  --output_dir="sd-pokemon-model-lora" \
  --validation_prompt="cute dragon creature" --report_to="wandb"

Logs

11/22/2023 08:36:20 - INFO - __main__ - ***** Running training *****
11/22/2023 08:36:20 - INFO - __main__ -   Num examples = 833
11/22/2023 08:36:20 - INFO - __main__ -   Num Epochs = 100
11/22/2023 08:36:20 - INFO - __main__ -   Instantaneous batch size per device = 1
11/22/2023 08:36:20 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
11/22/2023 08:36:20 - INFO - __main__ -   Gradient Accumulation steps = 1
11/22/2023 08:36:20 - INFO - __main__ -   Total optimization steps = 83300
Steps:   0%|                                                                                                                                | 0/83300 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "./repo/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 975, in <module>
    main()
  File "./repo/diffusers/examples/text_to_image/train_text_to_image_lora.py", line 801, in main
    model_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/unet_2d_condition.py", line 1075, in forward
    sample, res_samples = downsample_block(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/unet_2d_blocks.py", line 1160, in forward
    hidden_states = attn(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/transformer_2d.py", line 375, in forward
    hidden_states = block(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/attention.py", line 258, in forward
    attn_output = self.attn1(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 522, in forward
    return self.processor(
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/attention_processor.py", line 1211, in __call__
    query = attn.to_q(hidden_states, *args)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/lora.py", line 433, in forward
    out = super().forward(hidden_states) + (scale * self.lora_layer(hidden_states))
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/diffusers/models/lora.py", line 220, in forward
    down_hidden_states = self.down(hidden_states.to(dtype))
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "./miniconda3/envs/diffusers_cuda117/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapp
er_CUDA_mm)

System Info

  • diffusers version: 0.24.0.dev0
  • Platform: Linux-5.4.0-144-generic-x86_64-with-glibc2.31
  • Python version: 3.9.18
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Huggingface_hub version: 0.19.4
  • Transformers version: 4.35.2
  • Accelerate version: 0.24.1
  • xFormers version: not installed
  • Using GPU in script?: <Yes>
  • Using distributed or parallel set-up in script?: <NO>

Who can help?

@sayakpaul @patrickvonplaten

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Reactions: 3
  • Comments: 17 (4 by maintainers)

Commits related to this issue

Most upvoted comments

I think the reason is that the Lora parameters is added to the unet after unet is send to GPU. So the LORA is actually on CPU, leading to the error. A simple way to fix this typo is to first add Lora to unet and then send them together to GPU:

image

Hopefuly this is fixed when moving to PEFT - in the meantime if you don’t want to revert to an older version, I had the same issue, and fixed it by adding 1 line:

unet.to(accelerator.device, dtype=weight_dtype)

At my line 539, immediately after the LORA weights are added, and outside the loop:

    # Accumulate the LoRA params to optimize.
    unet_lora_parameters.extend(attn_module.to_q.lora_layer.parameters())
    unet_lora_parameters.extend(attn_module.to_k.lora_layer.parameters())
    unet_lora_parameters.extend(attn_module.to_v.lora_layer.parameters())
    unet_lora_parameters.extend(attn_module.to_out[0].lora_layer.parameters())

unet.to(accelerator.device, dtype=weight_dtype)

Thanks to @IceClear and others that found that some of the unet was on the wrong device.

I got the same error. However, reverting to the previous version, as @wellCh4n suggested, resolved the issue.

This seems like a setup problem to me as I am unable to reproduce it, even on a Google Colab: https://github.com/huggingface/diffusers/issues/5004#issuecomment-1780909598

I can reproduce the issue in my case too. But I’ve looked at the lora script commits, and there was a recent one with big changes, and I used the previous commit, which runs fine in my case.

Download this and replace ./diffusers/examples/text_to_image/train_text_to_image_lora.py This is a temporary solution, sadly I’m not familiar with this.