compute-runtime: NEO driver not detect GPU when using kernel 6.8.x.
NEO driver is not detect for GPU when using kernel 6.8.x.
When have kernel 6.5.x and 6.6.x this is present.
/opt/intel/oneapi/compiler/2024.0/bin/sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185GRE @ 2.80GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:acc:2] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO [24.05.28454.6]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.28454]
And on kernel 6.8.x have this:
/opt/intel/oneapi/compiler/2024.0/bin/sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185GRE @ 2.80GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:acc:2] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
About this issue
- Original URL
- State: closed
- Created 4 months ago
- Comments: 41 (7 by maintainers)
Links to this issue
Commits related to this issue
- Moving to 6.7 kernel due to issue with intel-compute-runtime on kernel 6.8 - https://github.com/intel/compute-runtime/issues/710 — committed to TimoVerbrugghe/homelab-monorepo by TimoVerbrugghe 3 months ago
- Add workaround for https://github.com/intel/compute-runtime/issues/710 — committed to silee2/llvm-project by silee2 3 months ago
- Bump Intel Compute Runtime to 24.13.29138.7 Fixes an upstream issue in the last version https://github.com/intel/compute-runtime/issues/710 — committed to nyanmisaka/jellyfin-packaging by nyanmisaka 2 months ago
I can reproduce this on Arch
Good news folks, we are going to adjust the logic on UMD side so we can accept new gtt size reported by i915 😉
I can reproduce this on Arch with Linux 6.8 release (6.8.1-arch1-1) using i915. Haven’t tried xe yet.
Exporting these works fine:
On 6.8:
On 6.7:
The issue seems to lie here: https://github.com/intel/compute-runtime/blob/03078541d7bcfdf2b669a07410e5a7bacf436c63/shared/source/memory_manager/gfx_partition.cpp#L250-L253
Release: https://github.com/intel/compute-runtime/releases/tag/24.09.28717.12
Tested with: Ubuntu 24.04 Alpha. Linux Kernel 6.8.4-lowlatency. TGL: 11th Gen Intel® Core™ i7-1185GRE @ 2.80GHz
Tested 6.8.0-rc3 based Xe KMD, and compute/Sysman driver worked with that, so this issue seems to be i915 KMD specific (as expected).
uploaded the fix to noble, thanks for the ping
@Disty0 If issue happens also with 6.6 kernel, I do not think it to be related to this issue => please file a separate one, and report also
compute-runtimeversion, and whereperfreports CPU usage to happen (run as root):New 6.8.5, 6.8.6 and 6.6.27 LTS kernels are unable to run using the GPU. It detects and tries to run on the GPU but gets stuck with 100% single CPU core usage. Happens on any OpenCL or SYCL app. (Kernel 6.8 is using the workaround provided in this thread.)
You can downgrade to Linux 6.8.4 for Arch Linux with these packages: linux 6.8.4: https://archive.archlinux.org/packages/l/linux/linux-6.8.4.arch1-1-x86_64.pkg.tar.zst linux-headers 6.8.4: https://archive.archlinux.org/packages/l/linux-headers/linux-headers-6.8.4.arch1-1-x86_64.pkg.tar.zst
Um, its release notes mention it still needing the env var workaround?
Slightly newer tag includes actual fix: https://github.com/intel/compute-runtime/compare/24.09.28717.12...24.09.28717.14
This is good news.
Note that the upcoming Ubuntu 24.04 LTS uses the non-LTS 6.8 kernel. Hopefully it can be fixed before it’s released next month. Otherwise OpenCL will not be available on many distros based on it.
In this case, will the NEO compute driver have adaptation to working on new behaviour?
Media and 3D drivers seem to work fine with that change, why it’s a problem for L0/compute stack?
(I’m wondering whether this change should be reported to upstream as kernel stable ABI breakage…)
Looking at the compute-runtime code, it seems to affect SVM capability & address space size: https://github.com/intel/compute-runtime/blob/master/shared/source/os_interface/linux/product_helper_drm.cpp#L128
Where’s in Mesa code: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/intel/vulkan/anv_device.c#L2300
Yes, it works with 6.7 (drm-tip) kernel also for me, just not with 6.8 (i915 KMD).
EDIT: that was with public Xe KMD repo, not drm-tip. With drm-tip, the issue is already with earlier kernel version (see below).