diffusers: Training memory optimizations not working on AMD hardware

Describe the bug

Dreambooth training example has a section about training on 16GB GPU. As Radeon Navi 21 series models all have 16GB available this in theory would increase the amount of hardware that can train models by a really large margin.

Problem is that at least out of the box neither of the optimizations --gradient_checkpointing nor --use_8bit_adam seem to support AMD cards.

Reproduction

Using the example command command with pytorch rocm 5.1.1 (pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.1.1)

--gradient_checkpointing: returns error 'UNet2DConditionModel' object has no attribute 'enable_gradient_checkpointing' --use_8bit_adam: throws handful of CUDA errors, see Logs section below for the main part (is bitsandbytes Nvidia specific and if it is is there an AMD implementation available?)

Logs

using --gradient_checkpointing:

The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_cpu_threads_per_process` was set to `12` to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
/opt/amdgpu/share/libdrm/amdgpu.ids: No such file or directory
Traceback (most recent call last):
  File "/home/foobar/diffusers/examples/dreambooth/train_dreambooth.py", line 606, in <module>
    main()
  File "/home/foobar/diffusers/examples/dreambooth/train_dreambooth.py", line 408, in main
    unet.enable_gradient_checkpointing()
  File "/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1207, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'UNet2DConditionModel' object has no attribute 'enable_gradient_checkpointing'
Traceback (most recent call last):
  File "/home/foobar/diffusers/.venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/accelerate/commands/launch.py", line 837, in launch_command
    simple_launcher(args)
  File "/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/accelerate/commands/launch.py", line 354, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

--use_8bit_adam:

...
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/bitsandbytes/cuda_setup/paths.py:20: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
  warn(
WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA SETUP: Loading binary /home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/home/foobar/pyenvtest/.venv/lib/python3.9/site-packages/bitsandbytes/cextension.py:48: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
...


### System Info

- `diffusers` version: 0.3.0
- Platform: Linux-5.15.67-x86_64-with-glibc2.34
- Python version: 3.9.13
- PyTorch version (GPU?): 1.12.1+rocm5.1.1 (True)
- Huggingface_hub version: 0.9.1
- Transformers version: 4.22.2
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 18 (4 by maintainers)

Most upvoted comments

Last night, I came across this fork of bitsandbytes called bitsandbytes-rocm, and I can confirm that it does in-fact work on AMD hardware and allows me to at least utilize dreambooth, I have not tested any of the other projects in this repo. With an RX 6900XT, I successfully ran dreambooth at 13GB VRAM utilization without xformers.

We will release a new diffusers version very soon!

Not related to AMD

The use_8bit_adam problems potentially are, as bitsandbytes includes a C extension which wraps some CUDA functions, i.e, doesn’t run through pytorch-rocm. Not really anything that can be fixed on this end though

I’m throwing a comment to try to keep the issue active, I’ll eventually be able to try testing it again just in case it did get fixed

I’m not really familiar with AMD GPUs, maybe @NouamaneTazi has some ideas here 😃

Method was added in this commit (10 days ago) https://github.com/huggingface/diffusers/commit/e7120bae955949355e50a4c2adef8353790fda0d

And latest release was 23 days ago.

Installing from main should solve the issue.

pip install git+https://github.com/huggingface/diffusers.git

Same issue and solution were already explained in #656, will leave my explanation just in case.