diffusers: [examples/text_to_image] cuda out of memory, though I followed the instructions of train_text_to_image.py

Describe the bug

when I ran the script: examples/text_to_image/text_to_image.py, using the follwing command:

export MODEL_NAME=“CompVis/stable-diffusion-v1-4” export dataset_name=“lambdalabs/pokemon-blip-captions”

python train_text_to_image.py
–pretrained_model_name_or_path=$MODEL_NAME
–dataset_name=$dataset_name
–use_ema
–resolution=512 --center_crop --random_flip
–train_batch_size=1
–gradient_accumulation_steps=4
–gradient_checkpointing
–max_train_steps=15000
–learning_rate=1e-05
–max_grad_norm=1
–lr_scheduler=“constant” --lr_warmup_steps=0
–output_dir=“sd-pokemon-model”
–mixed_precision=“fp16”

I tried to decrease the resolution and remove --center_crop --random_flip, it did not work.

The hardware I used: V100, 32GB pytorch1.11

logs:

Reproduction

train_1p.txt

Logs

No response

System Info

diffusers version: 0.17.0.dev0
Platform: Linux-5.4.0-60-generic-x86_64-with-debian-buster-sid
Python version: 3.7.5
PyTorch version (GPU?): 1.11.0+cu102 (True)
Huggingface_hub version: 0.14.1
Transformers version: 4.29.1
Accelerate version: 0.19.0
xFormers version: not installed
Using GPU in script?: <fill in> yes
Using distributed or parallel set-up in script?: <fill in> neither 1p or 8p can work

About this issue

Original URL
State: closed
Created a year ago
Comments: 16 (8 by maintainers)

Most upvoted comments

My tests:

PyTorch 2 + ema does not run in <= 24 GB. Testing in another card I found it took ~26 GB of RAM.
PyTorch 2 without ema works fine.
PyTorch 1.13.1 with xFormers (and using --enable_xformers_memory_efficient_attention) works fine and takes only ~14 GB of GPU RAM.

When using PyTorch 2 I verified that we are using AttnProcessor2_0 here: https://github.com/huggingface/diffusers/blob/c6ae8837512d0572639b9f57491d4482fdc8948c/src/diffusers/models/attention_processor.py#L161. I’m not sure what’s the reason for not fitting in 24 GB any more.

pcuenca on May 30, 2023