ROCm: Doesn't ROCm support AMD's integrated GPU (APU)?

I have an AMD Ryzen 5 5600G processor which has an integrated GPU, and I do not have a separate graphics card. Am using Linux Mint 21 Cinnamon.
I installed PyTorch with this command pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2 and according to posts on a forum, running the following command is supposed to tell me if PyTorch can use ROCm to perform CUDA equivalent processing.

import torch.cuda
print(f'CUDA available? : {torch.cuda.is_available()}')

The output is False.

PyTorch redirects people to this repository’s readme page to check for compatible GPU information, but I didn’t see any. So for the sake of anyone searching for this info:

  1. Could you publish a list of what hardware you support and which can be used with PyTorch or any other deep learning library, as an alternative to CUDA?
  2. Could you please support integrated GPU’s? Mine is supposed to be as powerful as an NVidia GT 1030. When it is so powerful, it just isn’t right to expect Users to purchase a separate graphics card. I do hope ROCm bridges NVidia’s monopoly on CUDA.

Update: Tried this script too, but the output is

Checking ROCM support...
Cannot find rocminfo command information. Unable to determine if AMDGPU drivers with ROCM support were installed.

Tried installing ROCm via instructions on this page (tried with the deb file for bionic and focal).
On running sudo rocminfo, I get:

ROCk module is loaded
Segmentation fault

On running rocminfo:

ROCk module is loaded
Unable to open /dev/kfd read-write: Permission denied
navin is not member of "render" group, the default DRM access group. Users must be a member of the "render" group or another DRM access group in order for ROCm applications to run successfully.

This script now outputs:

Checking ROCM support...
BAD: No ROCM devices found.
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
Cannot find rocminfo command information. Unable to determine if AMDGPU drivers with ROCM support were installed.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 6
  • Comments: 19

Most upvoted comments

Good news guys. I found the solution. The problem was with a version mismatch, and I don’t think I had run this command sudo amdgpu-install --usecase=hiplibsdk,rocm,dkms. I had to uninstall the installed PyTorch, and select PyTorch’s Nightly version, which matched with my RoCm version. So here’s the full procedure:
pip uninstall torch torchaudio torchdata torchvision sudo apt-get install ./amdgpu-install_5.4.50403-1_all.deb or sudo dpkg -i amdgpu-install_5.5.50502-1_all.deb sudo amdgpu-install --usecase=hiplibsdk,rocm,dkms sudo amdgpu-install --list-usecase sudo reboot rocm-smi sudo rocminfo

You have to run this after installing: sudo usermod -a -G video <your_username> sudo usermod -a -G render <your_username>

Check the RoCm version with:
apt show rocm-libs -a

Now select the Nightly version of PyTorch (or whichever version matches your RoCm version):
pytorchSelection

and install PyTorch.

Now if you run this script, it’ll show:

Checking ROCM support...
GOOD: ROCM devices found:  2
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user nav is in RENDER and VIDEO groups.
GOOD: PyTorch ROCM support found.
Testing PyTorch ROCM support...
Everything fine! You can run PyTorch code inside of: 
--->  AMD Ryzen 5 5600G with Radeon Graphics  
--->  gfx90c  

and you can even run this script without the environment variable.

import torch.cuda
print(f'CUDA available? : {torch.cuda.is_available()}')

Note to the Radeon Open Compute developers:

When you close this issue, I’ll know that you’ve seen it. Please close it. However, I hope you realize two things:

  1. The process of installation is complex. So it’s not just me, but many others who will end up being unable to solve the problem, and will raise an issue here or ask on StackOverflow. So please build a script which would auto-detect the system configuration and automatically install all necessary components and also create the user groups automatically. After RoCm gets installed, y’all could even show a message to the user about what they need to do to get the right version of PyTorch, if they intend to use PyTorch.
  2. CUDA is mentioned and advertised so much on the internet, that even an experienced developer like me initially didn’t know that RoCm was a way to use the GPU on AMD. You guys need to do a lot more advertising and blogging to make developers aware of RoCm. I’m looking forward to using it with Modular’s Mojo framework for AI.

Thanks for trying to help. Hope y’all would automate the install process.

One more thing. In the way that NVIDIA created a CUDA toolkit, it would help if AMD also created a module or interface which would help any application seamlessly use the GPU, whether the GPU is on a graphics card or on the processor. Also, it should work irrespective of the version.

  • I tried OpenLLM’s basic example code, and it crashes with Segmentation fault (core dumped).

  • Tried Vicuna via these instructions, but end up with FAILED: GPTQ-for-LLaMa/build/temp.linux-x86_64-3.9/quant_hip_kernel.o, which I found out, happens when the GPU version isn’t the correct one.

  • Tried a simple Convolutional Neural Network with PyTorch, after ensuring I moved my models and data to the GPU, and got this error: rocBLAS error: Cannot read /home/nav/.pyenv/versions/3.9.17/lib/python3.9/site-packages/torch/lib/rocblas/library/TensileLibrary.dat: Illegal seek. Aborted (core dumped).

This and this issue speak volumes about how much more needs to be done to improve RoCm support. People are feeling cheated.

This is the right time to ask your managers to provide you more time and resources to build a good architecture for RoCm, to support Machine Learning. Please do so.

Run like following,

# HSA_OVERRIDE_GFX_VERSION=9.0.0 python3
Python 3.8.16 (default, Mar  2 2023, 03:21:46)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch.cuda
>>> print(f'CUDA available? : {torch.cuda.is_available()}')
CUDA available? : True

@nav9 I can confirm that it also does not works with rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2 I got error RuntimeError: No HIP GPUs are available while try to use gfx90c with HSA_OVERRIDE_GFX_VERSION=9.0.0

I also attach rocminfo output (looks like all good) rocminfo.txt

@nav9 : Many thanks for the follow up. Yeah, I also experienced hanging in some builds - I tried a massive amount of combinations, with no luck. Well, at least I know I am not going crazy thanks to your response. I found several claims of people got gfx902 working but I doubt it now.

Unfortunately I also doubt AMD will help. For any ML stuff, they are driving people to NVIDIA in herds… Why would I buy AMD card or even APU when high chances are it is a dead metal for any ML/AI stuff.

Thanks again for your help here.

Thank you for your feedback and persistence to get this resolved. Good job! We will close this issue.

Somebody mentioned that I missed the HSA_OVERRIDE_GFX_VERSION=9.0.0 environment variable, and then deleted their comment. Well, I tried with the environment variable, and now it works! CUDA available shows True. So thank you very much.
However, sudo rocminfo still shows:

ROCk module is loaded
Segmentation fault

and this script run as HSA_OVERRIDE_GFX_VERSION=9.0.0 python3 trial.py outputs:

Checking ROCM support...
BAD: No ROCM devices found.
Checking PyTorch...
GOOD: PyTorch is working fine.
Checking user groups...
GOOD: The user nav is in RENDER and VIDEO groups.
GOOD: PyTorch ROCM support found.
Testing PyTorch ROCM support...
Everything fine! You can run PyTorch code inside of: 

Which is surprising, because if no ROCM devices were found, then how is everything fine?

And during installation, the commands amdgpu-install -y --usecase=rocm and sudo apt install rocm-hip-sdk result in the error:

The following packages have unmet dependencies:
 rocm-hip-runtime : Depends: rocminfo (= 1.0.0.50403-121~22.04) but 5.0.0-1 is to be installed
 rocm-utils : Depends: rocminfo (= 1.0.0.50403-121~22.04) but 5.0.0-1 is to be installed
E: Unable to correct problems, you have held broken packages.

Distribution:
This page says Ubuntu 22.04.2 is supported, and that’s what I believe my LinuxMint 21.1 x86_64 uses, because it shows UBUNTU_CODENAME=jammy, which is supposed to be Ubuntu 22.04.2. However, the supported kernels are 5.19.0-45-generic and 5.15.0-75-generic, but my kernel is Linux 5.15.0-72-generic #79-Ubuntu SMP. So I wonder if this could have also been a source of the problem.

ps: I really hope AMD could do something about making this install and use simple. It’s such a good chance to compete, now that AI is going really big.