compute-runtime: NEO driver not detect GPU when using kernel 6.8.x.

NEO driver is not detect for GPU when using kernel 6.8.x.

When have kernel 6.5.x and 6.6.x this is present.

/opt/intel/oneapi/compiler/2024.0/bin/sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185GRE @ 2.80GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:acc:2] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO  [24.05.28454.6]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics 1.3 [1.3.28454]

And on kernel 6.8.x have this:

/opt/intel/oneapi/compiler/2024.0/bin/sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185GRE @ 2.80GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:acc:2] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]

About this issue

  • Original URL
  • State: closed
  • Created 4 months ago
  • Comments: 41 (7 by maintainers)

Commits related to this issue

Most upvoted comments

I can reproduce this on Arch

Good news folks, we are going to adjust the logic on UMD side so we can accept new gtt size reported by i915 😉

I can reproduce this on Arch with Linux 6.8 release (6.8.1-arch1-1) using i915. Haven’t tried xe yet.

Exporting these works fine:

export NEOReadDebugKeys=1
export OverrideGpuAddressSpace=48

On 6.8:

gpuAddressSpace = 281474976706559
= 111111111111111111111111111111111110111111111111

On 6.7:

gpuAddressSpace = 281474976710655
 = 111111111111111111111111111111111111111111111111

The issue seems to lie here: https://github.com/intel/compute-runtime/blob/03078541d7bcfdf2b669a07410e5a7bacf436c63/shared/source/memory_manager/gfx_partition.cpp#L250-L253

Release: https://github.com/intel/compute-runtime/releases/tag/24.09.28717.12

Tested with: Ubuntu 24.04 Alpha. Linux Kernel 6.8.4-lowlatency. TGL: 11th Gen Intel® Core™ i7-1185GRE @ 2.80GHz

  • 6.8.5-lowlatency kernel version(new behaviour is change with this version).

Tested 6.8.0-rc3 based Xe KMD, and compute/Sysman driver worked with that, so this issue seems to be i915 KMD specific (as expected).

I applied this commit on top of the version currently shipped by Arch Linux (23.48.27912.11) and it fixed the problem with my i5-7200U iGPU, now clinfo is able to detect it and I could successfully run some admittedly simple OpenCL programs on Linux 6.8.2 (without any extra environment variables).

FYI: @tjaalton Ubuntu 24.04 LTS is also having a 6.8+ kernel, so its compute-runtime packages needs this too.

uploaded the fix to noble, thanks for the ping

New 6.8.5, 6.8.6 and 6.6.27 LTS kernels are unable to run using the GPU.

@Disty0 If issue happens also with 6.6 kernel, I do not think it to be related to this issue => please file a separate one, and report also compute-runtime version, and where perf reports CPU usage to happen (run as root):

# perf record -a
<wait a min or two>
^C
# perf report -n

New 6.8.5, 6.8.6 and 6.6.27 LTS kernels are unable to run using the GPU. It detects and tries to run on the GPU but gets stuck with 100% single CPU core usage. Happens on any OpenCL or SYCL app. (Kernel 6.8 is using the workaround provided in this thread.)

You can downgrade to Linux 6.8.4 for Arch Linux with these packages: linux 6.8.4: https://archive.archlinux.org/packages/l/linux/linux-6.8.4.arch1-1-x86_64.pkg.tar.zst linux-headers 6.8.4: https://archive.archlinux.org/packages/l/linux-headers/linux-headers-6.8.4.arch1-1-x86_64.pkg.tar.zst

Release: https://github.com/intel/compute-runtime/releases/tag/24.09.28717.12

Um, its release notes mention it still needing the env var workaround?

Slightly newer tag includes actual fix: https://github.com/intel/compute-runtime/compare/24.09.28717.12...24.09.28717.14

This is good news.

Note that the upcoming Ubuntu 24.04 LTS uses the non-LTS 6.8 kernel. Hopefully it can be fixed before it’s released next month. Otherwise OpenCL will not be available on many distros based on it.

In this case, will the NEO compute driver have adaptation to working on new behaviour?

we also observe issue with 6.8 kernel - i915 reports different I915_CONTEXT_PARAM_GTT_SIZE.

Media and 3D drivers seem to work fine with that change, why it’s a problem for L0/compute stack?

(I’m wondering whether this change should be reported to upstream as kernel stable ABI breakage…)

Looking at the compute-runtime code, it seems to affect SVM capability & address space size: https://github.com/intel/compute-runtime/blob/master/shared/source/os_interface/linux/product_helper_drm.cpp#L128

Where’s in Mesa code: https://gitlab.freedesktop.org/mesa/mesa/-/blob/main/src/intel/vulkan/anv_device.c#L2300

Yes, it works with 6.7 (drm-tip) kernel also for me, just not with 6.8 (i915 KMD).

EDIT: that was with public Xe KMD repo, not drm-tip. With drm-tip, the issue is already with earlier kernel version (see below).