diffusers: StableDiffusionPipeline producing unexpected output with MPS device using diffusers==0.4.0

Describe the bug

I tried testing the potential speed updates of diffusers 0.4.0 on my M1 mac using an existing StableDiffusionPipeline-based script, and I found that a large image that would take ~3 min to generate in diffusers 0.3.0 was estimated to take more than 10x as long.

Since my existing script had a lot going on (e.g. large resolutions, attention slicing), I tried to diagnose the problem with a minimal script (see below), running in two identical environments, with the only difference being the diffusers version.

In diffusers 0.3.0, it takes ~35 seconds to generate a reasonable result like this: test_0 3 0

In diffusers 0.4.0, it takes ~50 seconds (which is slower than 0.3.0, but better that the 10x performance hit I was getting before), but each attempt (with varying seeds) triggered the NSFW filter. Disabling the filter, the results appear to be just noise: test_0 4 0

I’m not sure if the initial 10x performance hit I initially observed in my original script would be fixed by fixing this bug, but this certainly seems to be at least part of it.

Reproduction

import torch from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(“CompVis/stable-diffusion-v1-4”) pipe.to(“mps”)

result = pipe(“dogs playing poker”, generator=torch.manual_seed(1))

result.images[0].save(“test.png”)

Logs

Under 0.4.0 there's also this warning:

/opt/homebrew/Caskroom/miniforge/base/envs/sd/lib/python3.10/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py:222: UserWarning: The operator 'aten::repeat_interleave.self_int' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  text_embeddings = text_embeddings.repeat_interleave(num_images_per_prompt, dim=0)

System Info

diffusers version: 0.4.0
Platform: macOS-12.6-arm64-arm-64bit
Python version: 3.10.6
PyTorch version (GPU?): 1.13.0.dev20220911 (False)
Huggingface_hub version: 0.10.0
Transformers version: 4.21.3
Using GPU in script?: MPS
Using distributed or parallel set-up in script?: no

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 16 (12 by maintainers)

Commits related to this issue

Add Resnet50 fp16 variant to pytests. (#760) — committed to nod-ai/diffusers by monorimet a year ago

Most upvoted comments

@patrickvonplaten Not blaming you guys, just noting what was previously reported 😄 Please look at the bug report I filed about a month back (https://github.com/huggingface/diffusers/issues/548) for numbers, which I already provided.

The numbers might have changed since the 0.4.0 (and subsequent releases) but I have not done actual numbers since then. I do install the latest version every once in a while (on top of PyTorch 1.13.0.dev20220924 since after that PyTorch slows down a lot too — the numbers for that are here: https://github.com/pytorch/pytorch/issues/86048) and test when I can but it continues to be slow.

Do note, I do appreciate everything you guys are doing and am simply reporting issues as I find them since I do realize that not everybody uses MPS. I wish I could do more by helping with the code, but I’m not up to the level of working on the code at the diffusers level - doing a GUI is about all I can do at this point 😄

But if there’s anything I can do to help (since I do work with MPS day in and day out and do work on Stable Diffusion related stuff whenever I can) do let me know.

FahimF on Oct 14, 2022