InvokeAI: M1 failing with `is not currently supported on the MPS backend...`

Followed the M1 instructions on Mac 12.5 version & python 3.10.4.

.../stable-diffusion/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/miniforge3/conda-bld/pytorch-recipe_1660136240338/work/aten/src/ATen/mps/MPSFallback.mm:11.)

and

.../stable-diffusion/ldm/modules/embedding_manager.py", line 155, in forward
    embedded_text[placeholder_idx] = placeholder_embedding
NotImplementedError: The operator 'aten::_index_put_impl_' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

I could try PYTORCH_ENABLE_MPS_FALLBACK but is that how people are getting around this issue?

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 5
Comments: 24

Most upvoted comments

Confirm that issue persists with latest pytorch nightly 1.13.0.dev20220901. It also looks like the aten:non_zero op hasn’t been implemented for the MPS backend in pytorch yet.

adelsz on Sep 1, 2022

Yes, at least whatever part of the code that uses nonzero. My Mac’s GPU seems to be under 100% load during calls to SD, however. (See Activity Monitor -> Window -> GPU History)

thomasaarholt on Sep 4, 2022

I am getting that same warning but images are generating in well under a minute so I think it is using the GPU. Just one more data point…

dperlman on Sep 14, 2022

Followed the M1 instructions on Mac 12.5 version & python 3.10.4.

.../stable-diffusion/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/miniforge3/conda-bld/pytorch-recipe_1660136240338/work/aten/src/ATen/mps/MPSFallback.mm:11.)

and

.../stable-diffusion/ldm/modules/embedding_manager.py", line 155, in forward
    embedded_text[placeholder_idx] = placeholder_embedding
NotImplementedError: The operator 'aten::_index_put_impl_' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

I could try PYTORCH_ENABLE_MPS_FALLBACK but is that how people are getting around this issue?

im having the same issue it falls back to using the cpu, please update if you find a fix

owen109 on Aug 31, 2022

No luck with mamba

Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement k-diffusion==0.0.1 (from versions: none)
ERROR: No matching distribution found for k-diffusion==0.0.1

Edit: I’ve managed to install all dependencies with mamba. Now it fails with ModuleNotFoundError: No module named 'ldm'

Ok, I’ve won this fight with modules. And still getting that error and generation is extremely slow

underlow on Sep 4, 2022

I can’t see anything obviously wrong with your log.

I installed using mamba (only because it’s faster, but I guess theoretically this could impact it).

I’ve just run git pull and tried the whole installation process again, starting with conda env create -f environment-mac.yaml.

If you want to try an alternative, I’ve exported my environment file here. Copy to a file called thomasaarholt_env.yml.

Create a new environment with: conda env create -f thomasaarholt_env.yml (or mamba env ...)

Then I linked (or copied) the model downloaded from huggingface, and ran: python scripts/preload_models.py and

❯ python scripts/dream.py --full_precision # I just tested, and the --full_precision argument doesn't appear necessary
* Initializing, be patient...

>> cuda not available, using device mps
>> Loading model from models/ldm/stable-diffusion-v1/model.ckpt
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Using slower but more accurate full-precision math (--full_precision)
>> Setting Sampler to k_lms
>> model loaded in 9.38s

* Initialization done! Awaiting your command (-h for help, 'q' to quit)
dream> A monkey hacking into the NSA

thomasaarholt on Sep 4, 2022

SD on M1 works fine. Use the environment-mac.yaml when creating your python environment with conda/mamba. I am running it right now on my M1 Macbook pro.

The warning containing aten::nonzero is still present, but the image generation works fine.

For me it isn’t so, I get the warning and it does fall back to the CPU, therefore generation time becomes very long and I never seen it get past 20% on 1 iteration and 5 steps. How can I get it not to fall back to CPU?

M1 MBP 2020

Namor-Votilav on Sep 4, 2022

Same issue with torch 1.13.0.dev20220901

rlaabs on Sep 1, 2022