InvokeAI: M1 failing with `is not currently supported on the MPS backend...`

Followed the M1 instructions on Mac 12.5 version & python 3.10.4.

.../stable-diffusion/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/miniforge3/conda-bld/pytorch-recipe_1660136240338/work/aten/src/ATen/mps/MPSFallback.mm:11.)

and

.../stable-diffusion/ldm/modules/embedding_manager.py", line 155, in forward
    embedded_text[placeholder_idx] = placeholder_embedding
NotImplementedError: The operator 'aten::_index_put_impl_' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

I could try PYTORCH_ENABLE_MPS_FALLBACK but is that how people are getting around this issue?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 5
  • Comments: 24

Most upvoted comments

Confirm that issue persists with latest pytorch nightly 1.13.0.dev20220901. It also looks like the aten:non_zero op hasn’t been implemented for the MPS backend in pytorch yet.

Screenshot 2022-09-02 at 01 54 03

Yes, at least whatever part of the code that uses nonzero. My Mac’s GPU seems to be under 100% load during calls to SD, however. (See Activity Monitor -> Window -> GPU History)

I am getting that same warning but images are generating in well under a minute so I think it is using the GPU. Just one more data point…

Followed the M1 instructions on Mac 12.5 version & python 3.10.4.

.../stable-diffusion/ldm/modules/embedding_manager.py:152: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at  /Users/runner/miniforge3/conda-bld/pytorch-recipe_1660136240338/work/aten/src/ATen/mps/MPSFallback.mm:11.)

and

.../stable-diffusion/ldm/modules/embedding_manager.py", line 155, in forward
    embedded_text[placeholder_idx] = placeholder_embedding
NotImplementedError: The operator 'aten::_index_put_impl_' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

I could try PYTORCH_ENABLE_MPS_FALLBACK but is that how people are getting around this issue?

im having the same issue it falls back to using the cpu, please update if you find a fix

No luck with mamba

Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement k-diffusion==0.0.1 (from versions: none)
ERROR: No matching distribution found for k-diffusion==0.0.1

Edit: I’ve managed to install all dependencies with mamba. Now it fails with ModuleNotFoundError: No module named 'ldm'

Ok, I’ve won this fight with modules. And still getting that error and generation is extremely slow

I can’t see anything obviously wrong with your log.

I installed using mamba (only because it’s faster, but I guess theoretically this could impact it).

I’ve just run git pull and tried the whole installation process again, starting with conda env create -f environment-mac.yaml.

If you want to try an alternative, I’ve exported my environment file here. Copy to a file called thomasaarholt_env.yml.

Create a new environment with: conda env create -f thomasaarholt_env.yml (or mamba env ...)

Then I linked (or copied) the model downloaded from huggingface, and ran: python scripts/preload_models.py and

❯ python scripts/dream.py --full_precision # I just tested, and the --full_precision argument doesn't appear necessary
* Initializing, be patient...

>> cuda not available, using device mps
>> Loading model from models/ldm/stable-diffusion-v1/model.ckpt
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Using slower but more accurate full-precision math (--full_precision)
>> Setting Sampler to k_lms
>> model loaded in 9.38s

* Initialization done! Awaiting your command (-h for help, 'q' to quit)
dream> A monkey hacking into the NSA

SD on M1 works fine. Use the environment-mac.yaml when creating your python environment with conda/mamba. I am running it right now on my M1 Macbook pro.

The warning containing aten::nonzero is still present, but the image generation works fine.

For me it isn’t so, I get the warning and it does fall back to the CPU, therefore generation time becomes very long and I never seen it get past 20% on 1 iteration and 5 steps. How can I get it not to fall back to CPU? image

M1 MBP 2020

Same issue with torch 1.13.0.dev20220901