ROCm: Ubuntu 16.04 - Failed to find any OpenCL platforms ROCM 1.7 not on 16.04 default Linux, on 4.13.0-21-generic

Hey,

i’m not able to get opencl running on my Ubuntu 16.04 system. System is running 9 AMD RX470 GPUs

Error with clinfo:

~$ /opt/rocm/opencl/bin/x86_64/clinfo
terminate called after throwing an instance of 'cl::Error'
what():  clGetPlatformIDs
Aborted (core dumped)

Error with Hello World example:

~$ ./HelloWorld
Failed to find any OpenCL platforms.
Failed to create OpenCL context.

Error with rocm-smi

====================    ROCm System Management Interface    ====================
================================================================================
 GPU  Temp    AvgPwr   SCLK     MCLK     Fan      Perf    SCLK OD
  5   24.0c   13.216W  300Mhz   300Mhz   43.92%   auto      0%
  3   21.0c   13.216W  300Mhz   300Mhz   43.92%   auto      0%
Traceback (most recent call last):
  File "/opt/rocm/bin/rocm-smi", line 1058, in <module>
    showAllConcise(deviceList)
  File "/opt/rocm/bin/rocm-smi", line 728, in showAllConcise
    fan = str(getFanSpeed(device))
  File "/opt/rocm/bin/rocm-smi", line 358, in getFanSpeed
    fanLevel = int(getSysfsValue(device, 'fan'))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Systeminfo:

Distributor ID: Ubuntu Description: Ubuntu 16.04.3 LTS Release: 16.04 Codename: xenial Kernel: 4.13.0-21-generic

lspci output:

00:00.0 Host bridge: Intel Corporation Device 5904 (rev 02)
00:02.0 VGA compatible controller: Intel Corporation Device 5906 (rev 02)
00:08.0 System peripheral: Intel Corporation Sky Lake Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI (rev 21)
00:17.0 SATA controller: Intel Corporation Sunrise Point-LP SATA Controller [AHCI mode] (rev 21)
00:1c.0 PCI bridge: Intel Corporation Device 9d12 (rev f1)
00:1c.3 PCI bridge: Intel Corporation Device 9d13 (rev f1)
00:1c.4 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port (rev f1)
00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port (rev f1)
00:1f.0 ISA bridge: Intel Corporation Device 9d50 (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Device 9d71 (rev 21)
00:1f.4 SMBus: Intel Corporation Sunrise Point-LP SMBus (rev 21)
01:00.0 PCI bridge: ASMedia Technology Inc. Device 1184
02:01.0 PCI bridge: ASMedia Technology Inc. Device 1184
02:03.0 PCI bridge: ASMedia Technology Inc. Device 1184
02:05.0 PCI bridge: ASMedia Technology Inc. Device 1184
02:07.0 PCI bridge: ASMedia Technology Inc. Device 1184
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
06:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
07:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
08:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 07)
09:00.0 PCI bridge: ASMedia Technology Inc. Device 1184
0a:01.0 PCI bridge: ASMedia Technology Inc. Device 1184
0a:03.0 PCI bridge: ASMedia Technology Inc. Device 1184
0a:05.0 PCI bridge: ASMedia Technology Inc. Device 1184
0a:07.0 PCI bridge: ASMedia Technology Inc. Device 1184
0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
0b:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
0c:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
0d:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
0d:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
0e:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev cf)
0e:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0

I guess the link to opencl info is missing somewhere on filesystem or systemvariable ? Please can someone advice me on how to troubleshoot this problem ?

Thanks ! regards

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 52 (3 by maintainers)

Most upvoted comments

@tekcomm

My team develops the userland components, We are now trying to break down the DKMS build process so it is better documented, we get your pain we need this as well in our team. The base Linux driver is developed in a separate team from ROCm userland team we work on everything from ROCr System library upward.

You see we trying to get better documentation in place especially around the base Linux driver, we are still a small team that was a startup in AMD for the last two year. I personally put the ROCm website up and new documentation site.

Have you looked at this install guide since it tells you which tool we need for release?

Also, we have been driving all new documentation to here.

You see we even documented all of our system level debug flags

Also, we very explicit what GPU we support with ROCm, You can get even more understand of core compiler technology, code object format etc here https://llvm.org/docs/AMDGPUUsage.html

On the Linux driver in the past, ROCm was using a bootleg kernel driver where it was totally controlled what we needed for ROCm Userland stack, understand we using newer PCIe feature older hardware did not support ( PCIe Gen3 Atomic Completion/PCIe Atomics example) because ROCm was targeted at Intel Xeon E5 server hardware and every system in this space has to go through system-level validation at AMD and the OEM/ODM. Note Ryzen, Threadripper, and EPYC support all the capabilities that Xeon E5 v3 does which we use.

On the base Linux driver, we made four changes from early days of the project,

  • First, we moved to DKMS style install for base Linux kernels install for the binary product starting with 1.7. This was critical for us to support more distros ( REHL/CENTOS, SUSE) with new hardware that is not upstream. This transition was not smooth and we are working on remedying this.

  • The second thing we have been working with Linux team upstream all the core ROCm Linux driver changes so when you pull a base Linux kernel ROCm userland just works except for our existing shipping GPU that documented to work with ROCm. This right now looks to be complete with 1.7 or 1.8 Linux kernel release.

  • Third from lesson learn on ROCm 1.0 to 1.6.4 we diving into the core Linux team improved the testing program, at the ROCm level we are growing our Test data center, we rolled on 30 new servers since December and another 20 will be in by end of March. One thing, we are building out is a Customer Validation Test Suite to validate there install, this should be complete by June. We will also make more of our ROCm Validation Tests to simplify customer driver development.

  • Fourth we are working with Linux team to improve documentation of the based driver, this will take time.

I want to ask for patience as we work on improving the project, I personally started the ROCm project at AMD with Ben Sander since we knew we had to change how we approached GPU Computing. We are working hard to make the adjustment to drive this to a solid product. Also, we working to get in place key foundation so we all can chase down issue

I want the group to see we working hard to address the 1.7 release issue. To help you and us I am giving you early access to 1.7.1 Beta this has been tested with 4.13 Generic Linux kernel. http://repo.radeon.com/misc/archive/beta/rocm-1.7.1-beta.tar.gz

best regards,

Greg CTO RTG - Systems Engineering

We working on bring it into next major release of ROCm.

On ROCm contributions, we love to get more community support, we need help in lots of areas, testing documentation and more. We want to grow the community and make open and collaborative. I would be happy even if you bought the userland of ROCm to NVIDIA, Intel, Qualcomm and ARM hardware.