diffusers: RuntimeError: CUDA error: invalid argument when using xformers

Describe the bug

When trying to run train_dreambooth.py with --enable_xformers_memory_efficient_attention the process exits with this error:

RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Steps:   0%|                                                                                                                          | 0/400 [00:07<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/*****/anaconda3/envs/sd-gpu/bin/accelerate:8 in <module>                                  │
│                                                                                                  │
│   5 from accelerate.commands.accelerate_cli import main                                          │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(main())                                                                         │
│   9                                                                                              │
│                                                                                                  │
│ /home/*****/anaconda3/envs/sd-gpu/lib/python3.10/site-packages/accelerate/commands/accelerate_c │
│ li.py:45 in main                                                                                 │
│                                                                                                  │
│   42 │   │   exit(1)                                                                             │
│   43 │                                                                                           │
│   44 │   # Run                                                                                   │
│ ❱ 45 │   args.func(args)                                                                         │
│   46                                                                                             │
│   47                                                                                             │
│   48 if __name__ == "__main__":                                                                  │
│                                                                                                  │
│ /home/*****/anaconda3/envs/sd-gpu/lib/python3.10/site-packages/accelerate/commands/launch.py:11 │
│ 04 in launch_command                                                                             │
│                                                                                                  │
│   1101 │   elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA  │
│   1102 │   │   sagemaker_launcher(defaults, args)                                                │
│   1103 │   else:                                                                                 │
│ ❱ 1104 │   │   simple_launcher(args)                                                             │
│   1105                                                                                           │
│   1106                                                                                           │
│   1107 def main():                                                                               │
│                                                                                                  │
│ /home/*****/anaconda3/envs/sd-gpu/lib/python3.10/site-packages/accelerate/commands/launch.py:56 │
│ 7 in simple_launcher                                                                             │
│                                                                                                  │
│    564 │   process = subprocess.Popen(cmd, env=current_env)                                      │
│    565 │   process.wait()                                                                        │
│    566 │   if process.returncode != 0:                                                           │
│ ❱  567 │   │   raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)       │
│    568                                                                                           │
│    569                                                                                           │
│    570 def multi_gpu_launcher(args):                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯

Reproduction

accelerate launch train_dreambooth.py --pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4 --instance_data_dir=./inputs --output_dir=./outputs --instance_prompt=“a photo of sks dog” --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=5e-6 --lr_scheduler=“constant” --lr_warmup_steps=0 --max_train_steps=400 --enable_xformers_memory_efficient_attention

Logs

No response

System Info

  • diffusers version: 0.12.0.dev0
  • Platform: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
  • Python version: 3.10.8
  • PyTorch version (GPU?): 1.13.0 (True)
  • Huggingface_hub version: 0.11.1
  • Transformers version: 0.15.0
  • Accelerate version: not installed
  • xFormers version: 0.0.15.dev395+git.7e05e2c
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: single GPU

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 19 (13 by maintainers)

Most upvoted comments

Thanks for the tip. I had the same issue. I solved it by installing this xformers pre-release package as @patil-suraj said and updating pytorch version to 1.13.1+cu117.

While I’m no longer getting an error, it looks like the model doesn’t learn anymore. The images which are generated after the training are the same as before it.

However, I’ve found an older version of xformers which works just fine: https://github.com/facebookresearch/xformers/commit/0bad001ddd56c080524d37c84ff58d9cd030ebfd. This seems to be the last commit that works for me, as far as I can tell from a few tests using later commits.

Here’s my environment and installation process.

GPU: 3060 CUDA version: 11.8 Python version: 3.10 OS: Arch Linux

Installation:

cd examples/dreambooth
pip install \
    -r requirements.txt \
    git+https://github.com/huggingface/diffusers.git@7c82a16fc14840429566aec40eb9e65aa57005fd \
    torch==1.13.1 \
    bitsandbytes==0.35.1 \
    triton==2.0.0.dev20221202 \
    scikit-learn \
    datasets \
    ninja
pip install git+https://github.com/facebookresearch/xformers.git@0bad001ddd56c080524d37c84ff58d9cd030ebfd

If nvcc is not on $PATH (like on Arch Linux), you can change the last line and specify the path to cuda like this:

PATH="$PATH:/opt/cuda/bin" pip install git+https://github.com/facebookresearch/xformers.git@0bad001ddd56c080524d37c84ff58d9cd030ebfd

Some details about versions:

  • ninja is installed to build xformers faster
  • bitsandbytes must be 0.35 because of this. Also, training with 0.35.4 makes the model generate blue noise for me, while 0.35.1 works fine.
Full package version list
absl-py                  1.4.0
accelerate               0.15.0
aiohttp                  3.8.3
aiosignal                1.3.1
async-timeout            4.0.2
attrs                    22.2.0
bitsandbytes             0.35.1
cachetools               5.2.1
certifi                  2022.12.7
charset-normalizer       2.1.1
cmake                    3.25.0
datasets                 2.8.0
diffusers                0.12.0.dev0
dill                     0.3.6
exceptiongroup           1.1.0
filelock                 3.9.0
frozenlist               1.3.3
fsspec                   2022.11.0
ftfy                     6.1.1
google-auth              2.16.0
google-auth-oauthlib     0.4.6
grpcio                   1.51.1
huggingface-hub          0.11.1
idna                     3.4
importlib-metadata       6.0.0
iniconfig                2.0.0
Jinja2                   3.1.2
joblib                   1.2.0
Markdown                 3.4.1
MarkupSafe               2.1.2
modelcards               0.1.6
multidict                6.0.4
multiprocess             0.70.14
mypy-extensions          0.4.3
ninja                    1.11.1
numpy                    1.24.1
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
oauthlib                 3.2.2
packaging                23.0
pandas                   1.5.3
Pillow                   9.4.0
pip                      22.3.1
pluggy                   1.0.0
protobuf                 3.20.3
psutil                   5.9.4
pyarrow                  10.0.1
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pyre-extensions          0.0.23
python-dateutil          2.8.2
pytz                     2022.7.1
PyYAML                   6.0
regex                    2022.10.31
requests                 2.28.2
requests-oauthlib        1.3.1
responses                0.18.0
rsa                      4.9
scikit-learn             1.2.0
scipy                    1.10.0
setuptools               65.5.0
six                      1.16.0
tensorboard              2.11.2
tensorboard-data-server  0.6.1
tensorboard-plugin-wit   1.8.1
threadpoolctl            3.1.0
tokenizers               0.13.2
tomli                    2.0.1
torch                    1.13.1
torchvision              0.14.1
tqdm                     4.64.1
transformers             4.25.1
triton                   2.0.0.dev20221202
typing_extensions        4.4.0
typing-inspect           0.8.0
urllib3                  1.26.14
wcwidth                  0.2.6
Werkzeug                 2.2.2
wheel                    0.38.4
xformers                 0.0.15.dev0+0bad001.d20230119
xxhash                   3.2.0
yarl                     1.8.2
zipp                     3.11.0

Edit: seems to work with both torch 1.12.1 and 1.13.1, updated the version information.

The two where definitely it works 😃. The arch I know has issues is SM8x except SM80 (so 30xx and 40xx mostly).

(Although it looks like there’s a bit more action in the xformers repo, so this might actually get fixed upstream at some point now.)

I tried this and the 0.17 pre-release.

I’ll report in xformers, but I believe I found a related issue there already.

Best

Evan Jones Website: www.ea-jones.com

On Wed, Feb 1, 2023 at 3:44 AM Suraj Patil @.***> wrote:

I don’t know it fixes it, but there has been a new release for xformers yesterday https://github.com/facebookresearch/xformers/releases/tag/v0.0.16

— Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/issues/1946#issuecomment-1411671020, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJ2T6AJ444RUBSQJ2LHQCCTWVIO5LANCNFSM6AAAAAATT5QNQM . You are receiving this because you commented.Message ID: @.***>

Could be an issue with xformers version, I have been using the xformers pre-release and it seems to be working without any issues https://pypi.org/project/xformers/#history

@davidpfahler in the meantime, using this to enable xformers instead of the built-in enable xformers method should work:

https://github.com/cloneofsimo/lora/blob/master/lora_diffusion/xformers_utils.py#L42

This might be an upstream bug in xformers https://github.com/facebookresearch/xformers/issues/563