diffusers: [OOM] Memory blows out when trying to upscale images larger than 128x128 using StableDiffusionUpscalePipeline
Describe the bug
When trying to upscale images larger than 128x128 the progress goes to 100% and then crashes with CUDA OOM.
With 512x512 images it’s trying to allocate 256.00 GiB!
Reproduction
import requests
from PIL import Image
from io import BytesIO
from diffusers import StableDiffusionUpscalePipeline
import torch
model_id = "stabilityai/stable-diffusion-x4-upscaler"
pipeline = StableDiffusionUpscalePipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16)
pipeline = pipeline.to("cuda")
url = "https://www.freepnglogos.com/uploads/512x512-logo/512x512-transparent-circle-instagram-media-network-social-logo-new-16.png"
response = requests.get(url)
low_res_img = Image.open(BytesIO(response.content)).convert("RGB")
prompt=""
upscaled_image = pipeline(prompt=prompt, image=low_res_img).images[0]
display(upscaled_image)
Logs
RuntimeError: CUDA out of memory. Tried to allocate 256.00 GiB (GPU 0; 14.76 GiB total capacity; 4.77 GiB already allocated; 8.28 GiB free; 5.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
System Info
diffusers
version: 0.9.0- Platform: Linux-5.10.133±x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.15
- PyTorch version (GPU?): 1.12.1+cu113 (True)
- Huggingface_hub version: 0.11.0
- Transformers version: 4.24.0
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (12 by maintainers)
I know this is closed and these things are in docs, but just wanted to say that if you’re running into this issue to install the following:
And to add this to your pipeline:
you can upscale a 512x image with a ~20GB GPU (I didn´t try with less), with the linked PR & using xformers in the attentions in the VAE (when properly picked up by the enablement, hence another PR). I’ve this running just fine on a private fork, it looks like all the missing pieces are arriving here (see this PR) else I can PR the required missing bits
I tried using it with xformers, i believe, and I think I got the same issue… i can re-run it… But the issue occurs in the creating of this empty tensor in the default attention block:
https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py#L331-L337
I implemented probably the most simplistic form of tiling possible here: https://github.com/carson-katri/dream-textures/blob/aa0132b42dd14ddbf9491c13a7a46a01da2c880a/generator_process/actions/upscale.py
I’m sure there are much better approaches that would limit seams. Perhaps just tiling the latent decoding process? Not entirely sure. Looking forward to seeing the improvements that will be made in this pipeline!
I’m working on a tile-based solution that runs the upscale model on small, overlapping patches of a larger source image and then merges them back into the full sized result. Much of the code is borrowed from realesrgan upscaler which supports this. Will try and publish code as soon as it’s working
@carson-katri correct you can try
and that should reduce some memory in exchange for a small speed decrease and enable larger inputs. With
xformers
installed it should be less as you point out!The model may support xformers and attention slicing, which could help I assume.
Thank you for the explanation, I thought this may be the case. Resizing the image to 128x128 would produce a 512x512 image, correct?
Thank you for reporting this. The reason this happens is that your initial image gets bigger e.g.
512x512
the latent representations end up being512 (latent dim) x 512 (H) x 512 (W)
in the decoding attention block which gets reshaped into batch =1
, seq =512 x 512 (H xW)
and channel =512
and that obviously will not work as vanilla attention is quadratic mem/compute in the seq length. Thus as you increase your initial image dims, the more mem. it will use.A solution for now in the above code is to downsize the initial image to something manageable e.g.:
As noted below you can also install xformers or try with attention slicing: