ROCm: 4.1.0 version stopped working on Fedora 33

Hello. After today update from 4.0.0 -> 4.1.0 OpenCL not working anymore:

❯ rocminfo

ROCk module is loaded
HSA Error:  Incompatible kernel and userspace, Vega 20 [Radeon VII] disabled. Upgrade amdgpu.
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 3 3300X 4-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 3 3300X 4-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3800                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            8                                  
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    8064576(0x7b0e40) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    8064576(0x7b0e40) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
    N/A                      
*** Done ***             

Previous 4.0.0 version work without any issues.

  • OS: Fedora 33
  • Kernel: 5.11.8
  • GPU: Radeon VII
  • Mesa: 20.3.4

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 5
  • Comments: 25 (3 by maintainers)

Most upvoted comments

With Linux kernel 5.12-rc4, the most recent released ATM, I still get the “upgrade amdgpu” error. It seems not any Linux kernel is fresh enough, i.e. the required amdgpu changes have not been merged upstream yet?

HSA Error:  Incompatible kernel and userspace, Vega 20 [Radeon VII] disabled. Upgrade amdgpu.

Linux x2 5.12.0-051200rc4-generic #202103212230 SMP Sun Mar 21 22:33:27 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Personally I prefer using an official Linux kernel with amdgpu, instead of dkms. As such, if the above is correct (?), I’ll have to postpone checking out ROCm 4.1 until an official kernel version is released that includes the required 4.1 admgpu updates.

@ROCmSupport thank you for the clarification!

Please do put out an announcement when the ROCm-4.1 required patches make it into upstream, enabling RadeonVII to be used again with a stock Linux kernel as before.

@ROCmSupport should maybe the issue be reopened during the investigation.

Great to know @tim77, good to know that 4.0 works perfect with 5.11 kernel. This point made me think little more and allowing me to gather more information. Let me gather more information and share some update if any. Thank you.

This seems to be fixed in upstream kernel 5.13: https://github.com/RadeonOpenCompute/ROCm/issues/1478#issuecomment-851189122

I was able to confirm that it works with kernel 5.14. I upgraded the kernel from 5.10.60 to 5.14.21 on the identical system and the error went away.

@ROCmSupport

Hi @tim77, you are referring to ROCm 4.0 works with same 5.11 kernel? I dont think so. Please specify the kernel version you used with ROCm 4.0.

  • kernel 5.11.9 (most recent and up to date kernel).
  • dkms even not installed.
  • OpenCL in ROCm v4.0.1 (rocm-opencl package, version 3.6Beta_17_g875c1f8_rocm_rel_4.0_26) works absolutely fine and stable (except Cycles render in Blender which doesn’t work but this is well known issue).
  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 Vega 20 [Radeon VII]

Full clinfo output: clinfo.txt.

I’ve downgraded from v4.1 -> v4.0.1 and OpenCL works perfectly. Without dkms. Screenshot with hashcat benchmark for testing purposes: Снимок экрана от 2021-03-26 12-23-15

OpenCL with ROCm 4.1 works fine with my RX 570 on kernel 5.11 (Fedora 33) and no DKMS whatsoever. Here is the proof: https://openbenchmarking.org/result/2103244-AS-ROCM41DAR27

@ROCmSupport I have tried ROCm 3.3 and ROCm 4.0 on Linux kernels 5.10, 5.11 and 5.12 and they all work.

OTOH ROCm 4.1 does not work on any of the Linux kernels 5.10, 5.11, 5.12. The error message “Upgrade amdgpu”, as I understand now, is also misleading – as what is required is a downgrade of amdgpu to Linux kernel 5.8.

And, given this regression from ROCm 4.0 to ROCm 4.1, and the misleading error message, the result is closing the bug report…

Could the documentation please specify which minimal Linux kernel version is required for a ROCm 4.1 install without dkms?

@tim77 Do you reboot after installing rock-dkms-4.1?

You can check whether dkms had installed successfully.

meicai@meicai-X99:~$ dkms status
amdgpu, 4.1-11, 5.4.0-64-generic, x86_64: installed
amdgpu, 4.1-11, 5.8.0-44-generic, x86_64: installed