ROCm: rocm-smi 3.9 & 3.10 returns error

I updated my Ubuntu 20.04 container to ROCm 3.9 from 3.8, and rocm-smi now produces the following error: ERROR:root:ROCm SMI returned 8 (the expected value is 0)

Running rocm_smi.py produces the same error, but rocm_smi_deprecated.py seems to work as expected with the following output:

========================ROCm System Management Interface========================
================================================================================
GPU  Temp   AvgPwr  SCLK     MCLK    Fan   Perf  PwrCap  VRAM%  GPU%  
1    31.0c  7.0W    1269Mhz  945Mhz  0.0%  auto  220.0W    0%   0%    
================================================================================
==============================End of ROCm SMI Log ==============================

I have tried completely uninstalling ROCm and reinstalling but the error persists.

rocm-smi Version: 3.9.0 Kernel version: 5.4.65-1-pve (container host is running ProxMox) GPU: Radeon Instinct MI25

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 22

Most upvoted comments

same issue with 4.1 @kentrussell : vladi@vladi-TB250-BTC:~$ rocm-smi Failed to get "domain" properity from properties files for kfd node 1. rsmi_init() failed ERROR:root:ROCm SMI returned 8 (the expected value is 0) vladi@vladi-TB250-BTC:~$ sudo ldconfig -p | grep rocm | grep smi [sudo] password for vladi: librocm_smi64.so.3 (libc6,x86-64) => /opt/rocm/lib/librocm_smi64.so.3 librocm_smi64.so.3 (libc6,x86-64) => /opt/rocm/rocm_smi/lib/librocm_smi64.so.3 librocm_smi64.so (libc6,x86-64) => /opt/rocm/lib/librocm_smi64.so librocm_smi64.so (libc6,x86-64) => /opt/rocm/rocm_smi/lib/librocm_smi64.so vladi@vladi-TB250-BTC:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal