ROCm: Regression in rocm 5.3 and newer for gfx1010

Since when pytorch 2 was officially released, i wasn’t able to run it on my 5700XT, while i was previously able to use it just fine on pytorch 1.13.1 by setting “export HSA_OVERRIDE_GFX_VERSION=10.3.0” There are many reporting the same issue on the 5000 series, like for example https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/6420

–precison-full and --no-half are also needed because the card seems like can’t use fp16 on linux/rocm, as already reported here https://github.com/RadeonOpenCompute/ROCm/issues/1857

i also read about the PCI atomics requirement, following this issue https://github.com/pytorch/pytorch/issues/103973 …But that doesn’t seems to be my case. the command “grep flags /sys/class/kfd/kfd/topology/nodes/*/io_links/0/properties” returns:

/sys/class/kfd/kfd/topology/nodes/0/io_links/0/properties:flags 3
/sys/class/kfd/kfd/topology/nodes/1/io_links/0/properties:flags 1

Also, i tried to compile pytorch using the new “-mprintf-kind=buffered” flag, but it didn’t change anything.

Finally, i recently found out that pytorch 2 works just fine on gfx1010 if that’s compiled by rocm 5.2, as suggested here https://github.com/pytorch/pytorch/issues/106728

About this issue

Original URL
State: open
Created 9 months ago
Reactions: 1
Comments: 23

Links to this issue

PSA: RDNA1 (gfx1010/gfx101*) GPUs should start working soon with official packages, hopefully with ROCm 6.1

Most upvoted comments

ok, we will tackle this issue next @kmsedu @DGdev91

hongxiayang on Nov 17, 2023

What’s your motivation to use newer ROCm? Expect better performance?

Well, for example to be able to use the official pytorch builds instead of using old nighties or compliling from source.

DGdev91 on Oct 6, 2023

Last time i tried i had a memory access error […] when trying to load a model in lama.cpp with both hipblas and clblast offloading, while the second worked fine on Windows. I had the same problem in both ArchLinux and Ubuntu.

To be clear, on Ubuntu were you using libhipblas-dev (which installs to /usr/lib/x86_64-linux-gnu) or were you using hipblas-dev (which installs to /opt/rocm/lib)? If you were using libhipblas-dev, I’m very interested in learning more. Could you provide some instructions on how to reproduce the problem?

That’s also true for every consumer-grade amd gpu other than the 7900xt and 7900xtx, wich many users are still using thanks to the same workaround.

Using HSA_OVERRIDE_GFX_VERSION=10.3.0 on RDNA 2 GPUs is fundamentally different from using it on RDNA 1 GPUs. All RDNA 2 GPUs use the exact same instructions, but there’s a bunch of differences between the instructions used on RDNA 1 and RDNA 2 GPUs. The only way to undo this ‘regression’ with HSA_OVERRIDE_GFX_VERSION would be to change LLVM so that the compiler only uses instructions available on RDNA 1, even when asked to compile for RDNA 2. That’s not going to happen.

A better path to getting gfx1010 enabled in PyTorch would be to build the ROCm math and AI libraries for gfx1010 (or gfx10.1-generic). That is probably not going to happen in AMD’s official packages, but there are other groups building and distributing ROCm packages. I can’t speak for other distributions, but I expect to have it enabled later this year on Debian. With that said, my work with Debian is strictly volunteer work (on top of my full-time job), so don’t expect it to happen quickly.

cgmb on Jan 29, 2024

Last time i tried i had a memory access error […] when trying to load a model in lama.cpp with both hipblas and clblast offloading, while the second worked fine on Windows. I had the same problem in both ArchLinux and Ubuntu.

To be clear, on Ubuntu were you using libhipblas-dev (which installs to /usr/lib/x86_64-linux-gnu) or were you using hipblas-dev (which installs to /opt/rocm/lib)? If you were using libhipblas-dev, I’m very interested in learning more. Could you provide some instructions on how to reproduce the problem?

That’s also true for every consumer-grade amd gpu other than the 7900xt and 7900xtx, wich many users are still using thanks to the same workaround.

Using HSA_OVERRIDE_GFX_VERSION=10.3.0 on RDNA 2 GPUs is fundamentally different from using it on RDNA 1 GPUs. All RDNA 2 GPUs use the exact same instructions, but there’s a bunch of differences between the instructions used on RDNA 1 and RDNA 2 GPUs. The only way to undo this ‘regression’ with HSA_OVERRIDE_GFX_VERSION would be to change LLVM so that the compiler only uses instructions available on RDNA 1, even when asked to compile for RDNA 2. That’s not going to happen.

A better path to getting gfx1010 enabled in PyTorch would be to build the ROCm math and AI libraries for gfx1010 (or gfx10.1-generic). That is probably not going to happen in AMD’s official packages, but there are other groups building and distributing ROCm packages. I can’t speak for other distributions, but I expect to have it enabled later this year on Debian. With that said, my work with Debian is strictly volunteer work (on top of my full-time job), so don’t expect it to happen quickly.

Ok, now it’s more clear. I can confirm i used hipblas-dev

I was also thinking that the hsa override flag was needed for rocblas too, because i couldn’t use it on native 1010 since the libs for 1010 were missing in the official packages.

I also just found this PR wich has been merged just 5 days ago wich makes life a bit more simple for compiling the tensile libs for 1010 https://github.com/ROCm/Tensile/pull/1862

DGdev91 on Jan 29, 2024

Hey there. No, I’m not running on the latest version of pytorch. And yes, you are correct, I’ve run this on/witth automatic1111’s webui with (your 🙏 ❤️) workaround in place. The env HSA_OVERRIDE_GFX_VERSION=9.4.0 made the solution to automatically downgrade the version of pytorch to the version built with rocm 5.2.

Ok, then you have just the same problem. We know that anything compiled using Rocm 5.2 or older works just fine on that card. If you try to force it to a newer version in webui_user.sh probably it’s not going to work. Also, usually HSA_OVERRIDE_GFX_VERSION=10.3.0 is used for the override. Automatic1111’s webui should force it automatically for older gpus, so maybe you were actually using that.

Since gfx1010 is not in the support gfx target list ( https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-gpus), the latest versions may not work for your gpu.

I know. But it’s still wierd that a gpu wich worked perfectly fine with pytorch compiled for an older romc version + HSA_OVERRIDE_GFX_VERSION=10.3.0 (even if not officially supported) suddently stops working on everything compiled using something newer. I also tried to pick up an old docker image with rocm 5.2 but it doesn’t seems able to compile it.

This is also true for other softwares wich rely on rocm, like llama-cpp with hipblas support

DGdev91 on Jan 28, 2024