ROCm: Doesn't ROCm support AMD's integrated GPU (APU)?
I have an AMD Ryzen 5 5600G processor which has an integrated GPU, and I do not have a separate graphics card. Am using Linux Mint 21 Cinnamon.
I installed PyTorch with this command pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2 and according to posts on a forum, running the following command is supposed to tell me if PyTorch can use ROCm to perform CUDA equivalent processing.
import torch.cuda
print(f'CUDA available? : {torch.cuda.is_available()}')
The output is False.
PyTorch redirects people to this repository’s readme page to check for compatible GPU information, but I didn’t see any. So for the sake of anyone searching for this info:
- Could you publish a list of what hardware you support and which can be used with PyTorch or any other deep learning library, as an alternative to CUDA?
- Could you please support integrated GPU’s? Mine is supposed to be as powerful as an NVidia GT 1030. When it is so powerful, it just isn’t right to expect Users to purchase a separate graphics card. I do hope ROCm bridges NVidia’s monopoly on CUDA.
Update: Tried this script too, but the output is
Checking ROCM support...
Cannot find rocminfo command information. Unable to determine if AMDGPU drivers with ROCM support were installed.
Tried installing ROCm via instructions on this page (tried with the deb file for bionic and focal).
On running sudo rocminfo, I get:
ROCk module is loaded
Segmentation fault
On running rocminfo:
ROCk module is loaded
Unable to open /dev/kfd read-write: Permission denied
navin is not member of "render" group, the default DRM access group. Users must be a member of the "render" group or another DRM access group in order for ROCm applications to run successfully.
This script now outputs:
Checking ROCM support...
BAD: No ROCM devices found.
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
Cannot find rocminfo command information. Unable to determine if AMDGPU drivers with ROCM support were installed.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 6
- Comments: 19
Good news guys. I found the solution. The problem was with a version mismatch, and I don’t think I had run this command
sudo amdgpu-install --usecase=hiplibsdk,rocm,dkms. I had to uninstall the installed PyTorch, and select PyTorch’s Nightly version, which matched with my RoCm version. So here’s the full procedure:pip uninstall torch torchaudio torchdata torchvisionsudo apt-get install ./amdgpu-install_5.4.50403-1_all.deborsudo dpkg -i amdgpu-install_5.5.50502-1_all.debsudo amdgpu-install --usecase=hiplibsdk,rocm,dkmssudo amdgpu-install --list-usecasesudo rebootrocm-smisudo rocminfoYou have to run this after installing:
sudo usermod -a -G video <your_username>sudo usermod -a -G render <your_username>Check the RoCm version with:
apt show rocm-libs -aNow select the Nightly version of PyTorch (or whichever version matches your RoCm version):

and install PyTorch.
Now if you run this script, it’ll show:
and you can even run this script without the environment variable.
Note to the Radeon Open Compute developers:
When you close this issue, I’ll know that you’ve seen it. Please close it. However, I hope you realize two things:
Thanks for trying to help. Hope y’all would automate the install process.
One more thing. In the way that NVIDIA created a CUDA toolkit, it would help if AMD also created a module or interface which would help any application seamlessly use the GPU, whether the GPU is on a graphics card or on the processor. Also, it should work irrespective of the version.
I tried OpenLLM’s basic example code, and it crashes with
Segmentation fault (core dumped).Tried Vicuna via these instructions, but end up with
FAILED: GPTQ-for-LLaMa/build/temp.linux-x86_64-3.9/quant_hip_kernel.o, which I found out, happens when the GPU version isn’t the correct one.Tried a simple Convolutional Neural Network with PyTorch, after ensuring I moved my models and data to the GPU, and got this error:
rocBLAS error: Cannot read /home/nav/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek. Aborted (core dumped).This and this issue speak volumes about how much more needs to be done to improve RoCm support. People are feeling cheated.
This is the right time to ask your managers to provide you more time and resources to build a good architecture for RoCm, to support Machine Learning. Please do so.
Run like following,
@nav9 I can confirm that it also does not works with
rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2I got errorRuntimeError: No HIP GPUs are availablewhile try to usegfx90cwithHSA_OVERRIDE_GFX_VERSION=9.0.0I also attach rocminfo output (looks like all good) rocminfo.txt
@nav9 : Many thanks for the follow up. Yeah, I also experienced hanging in some builds - I tried a massive amount of combinations, with no luck. Well, at least I know I am not going crazy thanks to your response. I found several claims of people got gfx902 working but I doubt it now.
Unfortunately I also doubt AMD will help. For any ML stuff, they are driving people to NVIDIA in herds… Why would I buy AMD card or even APU when high chances are it is a dead metal for any ML/AI stuff.
Thanks again for your help here.
Thank you for your feedback and persistence to get this resolved. Good job! We will close this issue.
Somebody mentioned that I missed the
HSA_OVERRIDE_GFX_VERSION=9.0.0environment variable, and then deleted their comment. Well, I tried with the environment variable, and now it works! CUDA available showsTrue. So thank you very much.However,
sudo rocminfostill shows:and this script run as
HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 trial.pyoutputs:Which is surprising, because if no ROCM devices were found, then how is everything fine?
And during installation, the commands
amdgpu-install -y --usecase=rocmandsudo apt install rocm-hip-sdkresult in the error:Distribution:
This page says
Ubuntu 22.04.2is supported, and that’s what I believe myLinuxMint 21.1 x86_64uses, because it showsUBUNTU_CODENAME=jammy, which is supposed to beUbuntu 22.04.2. However, the supported kernels are5.19.0-45-genericand5.15.0-75-generic, but my kernel isLinux 5.15.0-72-generic #79-Ubuntu SMP. So I wonder if this could have also been a source of the problem.ps: I really hope AMD could do something about making this install and use simple. It’s such a good chance to compete, now that AI is going really big.