diffusers: [OOM] Memory blows out when trying to upscale images larger than 128x128 using StableDiffusionUpscalePipeline

Describe the bug

When trying to upscale images larger than 128x128 the progress goes to 100% and then crashes with CUDA OOM.

With 512x512 images it’s trying to allocate 256.00 GiB!

Reproduction

import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionUpscalePipeline
import torch

model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipeline = pipeline.to("cuda")

url = "https://www.freepnglogos.com/uploads/512x512-logo/512x512-transparent-circle-instagram-media-network-social-logo-new-16.png"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
prompt=""
upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
display(upscaled_image)

Logs

RuntimeError: CUDA out of memory. Tried to allocate 256.00 GiB (GPU 0; 14.76 GiB total capacity; 4.77 GiB already allocated; 8.28 GiB free; 5.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

System Info

diffusers version: 0.9.0
Platform: Linux-5.10.133±x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.15
PyTorch version (GPU?): 1.12.1+cu113 (True)
Huggingface_hub version: 0.11.0
Transformers version: 4.24.0

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 17 (12 by maintainers)

Most upvoted comments

I know this is closed and these things are in docs, but just wanted to say that if you’re running into this issue to install the following:

pip install xformers
pip install triton==2.0.0.dev20221120

And to add this to your pipeline:

import torch
from diffusers import DiffusionPipeline
from xformers.ops import MemoryEfficientAttentionFlashAttentionOp

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16)
pipe = pipe.to("cuda")
pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)
# Workaround for not accepting attention shape using VAE for Flash Attention
pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None)

eolszewski on Feb 8, 2023

you can upscale a 512x image with a ~20GB GPU (I didn´t try with less), with the linked PR & using xformers in the attentions in the VAE (when properly picked up by the enablement, hence another PR). I’ve this running just fine on a private fork, it looks like all the missing pieces are arriving here (see this PR) else I can PR the required missing bits

blefaudeux on Dec 1, 2022

I tried using it with xformers, i believe, and I think I got the same issue… i can re-run it… But the issue occurs in the creating of this empty tensor in the default attention block:

https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py#L331-L337

kashif on Nov 30, 2022

I implemented probably the most simplistic form of tiling possible here: https://github.com/carson-katri/dream-textures/blob/aa0132b42dd14ddbf9491c13a7a46a01da2c880a/generator_process/actions/upscale.py

I’m sure there are much better approaches that would limit seams. Perhaps just tiling the latent decoding process? Not entirely sure. Looking forward to seeing the improvements that will be made in this pipeline!

carson-katri on Nov 28, 2022

I’m working on a tile-based solution that runs the upscale model on small, overlapping patches of a larger source image and then merges them back into the full sized result. Much of the code is borrowed from realesrgan upscaler which supports this. Will try and publish code as soon as it’s working

un1tz3r0 on Nov 27, 2022

@carson-katri correct you can try

pipeline.enable_attention_slicing()

and that should reduce some memory in exchange for a small speed decrease and enable larger inputs. With xformers installed it should be less as you point out!

kashif on Nov 26, 2022

The model may support xformers and attention slicing, which could help I assume.

carson-katri on Nov 26, 2022

Thank you for the explanation, I thought this may be the case. Resizing the image to 128x128 would produce a 512x512 image, correct?

carson-katri on Nov 26, 2022

Thank you for reporting this. The reason this happens is that your initial image gets bigger e.g. 512x512 the latent representations end up being 512 (latent dim) x 512 (H) x 512 (W) in the decoding attention block which gets reshaped into batch = 1, seq = 512 x 512 (H xW) and channel = 512 and that obviously will not work as vanilla attention is quadratic mem/compute in the seq length. Thus as you increase your initial image dims, the more mem. it will use.

A solution for now in the above code is to downsize the initial image to something manageable e.g.:

low_res_img = low_res_img.resize((128, 128))

As noted below you can also install xformers or try with attention slicing:

pipeline.enable_attention_slicing()
# pipeline.enable_xformers_memory_efficient_attention()

kashif on Nov 26, 2022