ROCm: ROCm 2.2 problems on Ubuntu 18.04 with the recommended 14.15 kernel

Unfortunately, my installation of ROCm v.2.2 fails on Ubuntu 18.04 with the recommended older kernel 4.15.0-20-generic.

The system details: Supermicro H11SSL-i MB AMD Epyc 7521 Radeon VII

The installation of rocm-dkms for 4.15.0-20-generic seems to go well:

sudo apt install rocm-dkms
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  comgr dkms hcc hip_base hip_doc hip_hcc hip_samples hsa-amd-aqlprofile hsa-ext-rocr-dev hsa-rocr-dev hsakmt-roct hsakmt-roct-dev rock-dkms rocm-clang-ocl rocm-dev rocm-device-libs rocm-opencl
  rocm-opencl-dev rocm-smi rocm-utils rocminfo rocr_debug_agent
Suggested packages:
  menu
The following NEW packages will be installed:
  comgr dkms hcc hip_base hip_doc hip_hcc hip_samples hsa-amd-aqlprofile hsa-ext-rocr-dev hsa-rocr-dev hsakmt-roct hsakmt-roct-dev rock-dkms rocm-clang-ocl rocm-dev rocm-device-libs rocm-dkms
  rocm-opencl rocm-opencl-dev rocm-smi rocm-utils rocminfo rocr_debug_agent
0 upgraded, 23 newly installed, 0 to remove and 0 not upgraded.
Need to get 68,0 kB/442 MB of archives.
After this operation, 2 054 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://ru.archive.ubuntu.com/ubuntu bionic-updates/main amd64 dkms all 2.3-3ubuntu9.2 [68,0 kB]
Fetched 68,0 kB in 0s (1 319 kB/s)
Selecting previously unselected package comgr.
(Reading database ... 194442 files and directories currently installed.)
Preparing to unpack .../00-comgr_1.1.0_amd64.deb ...
Unpacking comgr (1.1.0) ...
Selecting previously unselected package dkms.
Preparing to unpack .../01-dkms_2.3-3ubuntu9.2_all.deb ...
Unpacking dkms (2.3-3ubuntu9.2) ...
Selecting previously unselected package hsa-ext-rocr-dev.
Preparing to unpack .../02-hsa-ext-rocr-dev_1.1.9-55-gbac2a9b_amd64.deb ...
Unpacking hsa-ext-rocr-dev (1.1.9-55-gbac2a9b) ...
Selecting previously unselected package hsakmt-roct.
Preparing to unpack .../03-hsakmt-roct_1.0.9-121-g876627e_amd64.deb ...
Unpacking hsakmt-roct (1.0.9-121-g876627e) ...
Selecting previously unselected package hsakmt-roct-dev.
Preparing to unpack .../04-hsakmt-roct-dev_1.0.9-121-g876627e_amd64.deb ...
Unpacking hsakmt-roct-dev (1.0.9-121-g876627e) ...
Selecting previously unselected package hsa-rocr-dev.
Preparing to unpack .../05-hsa-rocr-dev_1.1.9-55-gbac2a9b_amd64.deb ...
Unpacking hsa-rocr-dev (1.1.9-55-gbac2a9b) ...
Selecting previously unselected package rocminfo.
Preparing to unpack .../06-rocminfo_1.0.0_amd64.deb ...
Unpacking rocminfo (1.0.0) ...
Selecting previously unselected package rocm-opencl.
Preparing to unpack .../07-rocm-opencl_1.2.0-2019030702_amd64.deb ...
Unpacking rocm-opencl (1.2.0-2019030702) ...
Selecting previously unselected package rocm-opencl-dev.
Preparing to unpack .../08-rocm-opencl-dev_1.2.0-2019030702_amd64.deb ...
Unpacking rocm-opencl-dev (1.2.0-2019030702) ...
Selecting previously unselected package rocm-clang-ocl.
Preparing to unpack .../09-rocm-clang-ocl_0.4.0-7ce124f_amd64.deb ...
Unpacking rocm-clang-ocl (0.4.0-7ce124f) ...
Selecting previously unselected package rocm-utils.
Preparing to unpack .../10-rocm-utils_2.2.31_amd64.deb ...
Unpacking rocm-utils (2.2.31) ...
Selecting previously unselected package hcc.
Preparing to unpack .../11-hcc_1.3.19092_amd64.deb ...
Unpacking hcc (1.3.19092) ...
Selecting previously unselected package hip_base.
Preparing to unpack .../12-hip%5fbase_1.5.19055_amd64.deb ...
Unpacking hip_base (1.5.19055) ...
Selecting previously unselected package hip_doc.
Preparing to unpack .../13-hip%5fdoc_1.5.19055_amd64.deb ...
Unpacking hip_doc (1.5.19055) ...
Selecting previously unselected package hip_hcc.
Preparing to unpack .../14-hip%5fhcc_1.5.19055_amd64.deb ...
Unpacking hip_hcc (1.5.19055) ...
Selecting previously unselected package hip_samples.
Preparing to unpack .../15-hip%5fsamples_1.5.19055_amd64.deb ...
Unpacking hip_samples (1.5.19055) ...
Selecting previously unselected package hsa-amd-aqlprofile.
Preparing to unpack .../16-hsa-amd-aqlprofile_1.0.0_amd64.deb ...
Unpacking hsa-amd-aqlprofile (1.0.0) ...
Selecting previously unselected package rock-dkms.
Preparing to unpack .../17-rock-dkms_2.2-31_all.deb ...
Unpacking rock-dkms (2.2-31) ...
Selecting previously unselected package rocm-device-libs.
Preparing to unpack .../18-rocm-device-libs_0.0.1_amd64.deb ...
Unpacking rocm-device-libs (0.0.1) ...
Selecting previously unselected package rocm-smi.
Preparing to unpack .../19-rocm-smi_1.0.0-102-gdb444a9_amd64.deb ...
Unpacking rocm-smi (1.0.0-102-gdb444a9) ...
Selecting previously unselected package rocr_debug_agent.
Preparing to unpack .../20-rocr%5fdebug%5fagent_1.0.0_amd64.deb ...
Unpacking rocr_debug_agent (1.0.0) ...
Selecting previously unselected package rocm-dev.
Preparing to unpack .../21-rocm-dev_2.2.31_amd64.deb ...
Unpacking rocm-dev (2.2.31) ...
Selecting previously unselected package rocm-dkms.
Preparing to unpack .../22-rocm-dkms_2.2.31_amd64.deb ...
Unpacking rocm-dkms (2.2.31) ...
Setting up comgr (1.1.0) ...
Setting up rocr_debug_agent (1.0.0) ...
Setting up rocm-smi (1.0.0-102-gdb444a9) ...
Setting up rocm-device-libs (0.0.1) ...
Setting up hip_base (1.5.19055) ...
Setting up hsa-ext-rocr-dev (1.1.9-55-gbac2a9b) ...
Setting up hsakmt-roct (1.0.9-121-g876627e) ...
Setting up dkms (2.3-3ubuntu9.2) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Setting up rocminfo (1.0.0) ...
Setting up hsa-amd-aqlprofile (1.0.0) ...
Setting up hip_doc (1.5.19055) ...
Setting up hip_samples (1.5.19055) ...
Setting up hsakmt-roct-dev (1.0.9-121-g876627e) ...
Setting up rock-dkms (2.2-31) ...
Loading new amdgpu-2.2-31 DKMS files...
Building for 4.15.0-20-generic 4.18.0-16-generic
Building for architecture x86_64
Building initial module for 4.15.0-20-generic
Done.
Forcing installation of amdgpu

amdgpu:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.15.0-20-generic/updates/dkms/

amdttm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.15.0-20-generic/updates/dkms/

amdkcl.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.15.0-20-generic/updates/dkms/

amdchash.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.15.0-20-generic/updates/dkms/

amd-sched.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.15.0-20-generic/updates/dkms/

Running the post_install script:
update-initramfs: Generating /boot/initrd.img-4.15.0-20-generic

depmod...

Backing up initrd.img-4.15.0-20-generic to /boot/initrd.img-4.15.0-20-generic.old-dkms
Making new initrd.img-4.15.0-20-generic
(If next boot fails, revert to initrd.img-4.15.0-20-generic.old-dkms image)
update-initramfs....

DKMS: install completed.
Building initial module for 4.18.0-16-generic
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/rock-dkms.0.crash'
Error! Bad return status for module build on kernel: 4.18.0-16-generic (x86_64)
Consult /var/lib/dkms/amdgpu/2.2-31/build/make.log for more information.
Setting up hsa-rocr-dev (1.1.9-55-gbac2a9b) ...
Setting up rocm-opencl (1.2.0-2019030702) ...
Setting up rocm-opencl-dev (1.2.0-2019030702) ...
Setting up rocm-clang-ocl (0.4.0-7ce124f) ...
Setting up rocm-utils (2.2.31) ...
Setting up hcc (1.3.19092) ...
Setting up hip_hcc (1.5.19055) ...
Setting up rocm-dev (2.2.31) ...
Setting up rocm-dkms (2.2.31) ...
KERNEL=="kfd", MODE="0666"
Processing triggers for libc-bin (2.27-3ubuntu1) ...

But basic tests show problems:

sudo /opt/rocm/bin/rocminfo
hsa api call failure at line 900, file: /data/jenkins_workspace/compute-rocm-rel-2.2/rocminfo/rocminfo.cc. Call returned 4104

There is no /dev/kfd folder and dmesg | grep amdkfd gives nothing.

Moreover there is strange output from amdgpu:

dmesg | grep amdgpu
[    2.700394] amdgpu: Unknown symbol amd_iommu_bind_pasid (err 0)
[    2.700832] amdgpu: Unknown symbol amd_iommu_set_invalidate_ctx_cb (err 0)
[    2.700937] amdgpu: Unknown symbol amd_iommu_free_device (err 0)
[    2.701315] amdgpu: Unknown symbol amd_iommu_unbind_pasid (err 0)
[    2.701343] amdgpu: Unknown symbol amd_iommu_init_device (err 0)
[    2.701518] amdgpu: Unknown symbol amd_iommu_set_invalid_ppr_cb (err 0)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15

Most upvoted comments

Everything is alright after I’ve installed sudo apt install linux-modules-extra-4.15.0-20-generic!

Thank you very much for your support!