egl-wayland: WL Vulkan apps are broken with PRIME
Hello,
This is sort of a continuation of #41 but for Vulkan apps/games
So Vulkan apps (like PPSSPP or vkcube) fail to work with Wayland on my PRIME setup:
$ prime-run vkcube-wayland
Selected GPU 0: NVIDIA GeForce GTX 1650 Ti, type: DiscreteGpu
[destroyed object]: error 7: failed to import supplied dmabufs: Arguments are inconsistent (for example, a valid context requires buffers not supplied by a
As you can see it’s identical to the OpenGL error (but the OpenGL one has already been fixed) but I also checked the Wayland logs and the (probably) NVIDIA modifier is present (so the linear modifier needs to be used somehow)
Running both PPSSPP and vkcube with XWayland removes the problem (by using SDL_VIDEODRIVER=x11
variable or the X11 vkcube executable)
And now time for the all important system info 🐸 (although it’s kinda redundant here): Distro: Arch Linux egl-wayland version: 1.1.11 (Git version also fails) Mesa version: 22.2.1 Driver version: 515.76 Kernel version: 6.0.6 Compositor: mutter 43.0 (through an unofficial repo) CPU: Ryzen 5 4600H GPU: Renoir iGPU + GTX 1650 Ti Mobile (as I said a PRIME setup)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 5
- Comments: 103
Commits related to this issue
- Enable vulkan presentation on Intel Mesa >= v21.2 Due to an issue with Mesa versions less than 21.2 presentation on Vulkan was forced to Nvidia only. This in itself brought new issues around the Nvid... — committed to flukejones/wgpu by flukejones 10 months ago
- Enable vulkan presentation on Intel Mesa >= v21.2 Due to an issue with Mesa versions less than 21.2 presentation on Vulkan was forced to Nvidia only. This in itself brought new issues around the Nvid... — committed to flukejones/wgpu by flukejones 10 months ago
- Enable vulkan presentation on Intel Mesa >= v21.2 (#4110) Due to an issue with Mesa versions less than 21.2 presentation on Vulkan was forced to Nvidia only. This in itself brought new issues around... — committed to gfx-rs/wgpu by flukejones 10 months ago
- Enable vulkan presentation on Intel Mesa >= v21.2 (#4110) Due to an issue with Mesa versions less than 21.2 presentation on Vulkan was forced to Nvidia only. This in itself brought new issues around... — committed to bradwerth/wgpu by flukejones 10 months ago
- Add temporary fix for Vulkan+PRIME on Wayland It should be removed for driver v550 and later! For more details, see https://github.com/NVIDIA/egl-wayland/issues/72#issuecomment-1843446296 Signed-off... — committed to polter-rnd/nvidia-kmod by polter-rnd 7 months ago
- Add temporary fix for Vulkan+PRIME on Wayland It should be removed for driver v550 and later! For more details, see https://github.com/NVIDIA/egl-wayland/issues/72#issuecomment-1843446296 Signed-off... — committed to polter-rnd/nvidia-kmod by polter-rnd 7 months ago
- Update udev rules See: https://gitlab.archlinux.org/archlinux/packaging/packages/nvidia-utils/-/merge_requests/1 https://github.com/NVIDIA/egl-wayland/issues/72#issuecomment-1792908365 https://git... — committed to ventureoo/nvidia-tweaks by ventureoo 5 months ago
This feature has been implemented by @dkorkmazturk. It will be available in the next major driver version, 545 (not the recently released 535 beta).
A quick update - we have figured out what is causing the issue. It did turn out to be a driver bug affecting pre-Turing GPUs. The fix is targeted for the next driver release, 550, early next year.
Omg, I finally managed to reproduce the vkcube-wayland hang with a different GPU (Quadro P620). Not exactly sure what the cause it yet, but at least now it’s possible to debug. What does seem immediately clear is that it’s not a power management issue, it actually looks like it’s related to a new synchronization mechanism that was introduced in 545. I shall update with further progress. Thanks so much to everyone who provided logs, etc… that definitely helped narrow down the problem.
For anyone waiting for this update, and wondering when it may come, here’s some recent release data (only xx5 releases):
If one can extrapolate, then I’d expect a release in November '23.
Yeah, that’s true.
Also, I must ask that anyone who uses this work-around please promise to revert it once 550 is released. In the future more things will depend on sync_file support and so having it disabled will almost certainly cause problems.
Actually can confirm that workaround works, but why delete whole block? It seems that deleting code inside the macro is enough.
Here is a patch for NixOS users:
I think we should keep this discussion focused on the PRIME problem. The yuzu crash is tracked here https://github.com/yuzu-emu/yuzu/issues/11941
I am able to reproduce it with a debug build of the driver and will dig deeper next week. Anything I find out will be posted to the other issue I linked.
Straying from the main topic, but @kanashimia
Sorry for overlooking this initially, but I’ve confirmed with the kernel module folks on our team that the /proc/devices name has indeed been changed from nvidia-frontend to nvidia in 545. We apologies for not anticipating that this might break some workflows They suggested something like
grep "\<nvidia\>" /proc/modules
as an alternative solution.https://github.com/NVIDIA/egl-wayland/assets/62414119/e978cb4f-b25b-4309-8574-e08184fd3c17
Same issue Here’s a recording of what is happening with vkcube-wayland for me.
Okay so as of 545.23.06, atleast on KDE wayland, now vkcube-wayland doesn’t crash but freezes immediately on startup. The cube is visible but spins very slow, like 1 frame every 5 seconds.
Also, When I tried to run yuzu emulator on wayland with vulkan backend selected, I get the following error:
KDE Plasma 5.27.8 Distro: Arch Linux Kernel: 6.5.7-zen
Our GPUs render using a hardware-specific pixel layout which Intel and AMD GPUs don’t understand. When __NV_PRIME_RENDER_OFFLOAD=1 is set, after rendering each frame we will convert it to a linear layout so that the integrated GPU can display it. The code to do that is wired up for OpenGL and Vulkan X11 applications, and OpenGL Wayland applications, but not for Vulkan Wayland applications.
For Vulkan applications, __NV_PRIME_RENDER_OFFLOAD=1 will also enable the NV_optimus layer as you mentioned, which changes the order that GPUs are enumerated so that the NVIDIA GPU will appear first.
Well, I’m going to have to test Portal. When I can.
Vulkan Wayland applications should be working correctly with 545.29.06 on Turing-or-later GPUs. Including PRIME render-offload.
The issue I was referring to in my previous comment was the extremely low framerates (0.2FPS) that several users had reported. All of those users had Pascal GPUs.
This appears to be fixed with the 545.29.06 driver release!
Here’s Half-Life 2, running on Wayland with Vulkan!
From Vasishath’s nvidia-bug-report.log.gz, the following snippet is interesting…
Even on my GTX1080 system, which should have the same power management features (I think), I’m not seeing weird values like that.
Back to the PRIME problem, I probably should have requested this earlier, but another thing that might help is running the nvidia-bug-report.sh script that is installed with the driver and uploading the file it generates here. Ideally immediately after reproducing the bug in case there are any relevant messages in the system log.
Neither myself nor Dogukan have been able to reproduce the issue, unfortunately.
wayland package is at version 1.22.0-1
vkcube-wayland --present_mode 1 has the same result. 1 is for VK_PRESENT_MODE_MAILBOX_KHR.
Edit: It’s the same for OpenGL apps on wayland as well. 0ad is also hanging in the same way when started with these flags:
SDL_VIDEODRIVER=wayland prime-run 0ad
I have nvidia_drm.modeset=1 in my bootargs. Is there any change needed here @erik-kz ?
This is becoming quite a serious issue for many people. Can this please be made a priority?
The 545 driver was the first version to include support for sync_files, https://www.kernel.org/doc/Documentation/sync_file.txt, a new synchronization mechanism. The bug was in our implementation of that feature. 545 also included a fairly extensive re-write of the Vulkan Wayland WSI code, and part of that made use of the new sync_file functionality. That’s why Vulkan Wayland apps were affected by the bug.
A possible work-around would be to extract the driver installer and edit the file nvidia-drm-drv.c. In the
nv_drm_get_dev_info_ioctl
function delete the following blockThis will disable sync_file support
Can you share some technical details about what exactly the issue was and any workaround (other than running a background app) for the time being?
On Wed, 6 Dec, 2023, 04:54 Erik Kurzinger, @.***> wrote:
The only non-default module option I am using is “modeset=1” for nvidia-drm. As I said in my previous comment, uploading the file generated by running nvidia-bug-report.sh after reproducing the bug would be helpful.
For what it’s worth, I was able to make some progress on the wgpu hang. I have a small driver change that does fix it, although I’m still trying to understand why it only seems to be necessary for that particular application. Also, I still don’t know if that’s related to the issues with other applications (which I haven’t been able to reproduce).
This is an unrelated issue, and it’s not an NVIDIA bug. The problem is that eglgears_wayland calls poll on the Wayland socket without using wl_display_prepare_read / wl_display_read_events. See src/egl/eglut/wsi/wayland.c in the mesa-demos repo. This causes problems if there are other threads also trying to read from the socket.
@flukejones Regarding the wgpu issue, I actually can reproduce it, but I’m not sure if it’s related to the hangs other users have reported. Interestingly, if I capture a stack trace it’s a bit different than the one you posted. On my system, it doesn’t hang in vkWaitForFences but instead just spins in the winit event loop after presenting the first frame. Another thing is that setting WINIT_UNIX_BACKEND=x11 doesn’t seem to do anything for me, it still uses Wayland.
Otherwise, I spent a fair amount of time today trying to reproduce the vkcube-wayland hang on multiple machines, with different compositors, etc. but it continues to elude me.
yes
Yes. Modesetting is enabled.
Edit: I also enabled fbdev seeing a post on the nvidia linux forum, and now the vkcube animation is working, but the window is still unresonsive.
On Wed, 18 Oct, 2023, 21:45 Erik Kurzinger, @.***> wrote:
After 1.1.12 release it’s happening again for some OpenGL apps as well (e.g. mpv: https://github.com/mpv-player/mpv/issues/11774)
I have the same issue with an Intel + Nvidia setup. Vulkan apps with native wayland support crash at startup when using the dGPU. Running vkcube-wayland with my dGPU gives me:
Retroarch and Ryujinx also gave me similar results, crashing at startup if I try to run them with prime-run. Running both apps through Xwayland works like a charm, though, as did using the opengl API instead of Vulkan.
My system info:
Distro: Arch Linux egl-wayland version: 1.1.11 Mesa version: 22.2.3-1 Driver version: 525.60.11 Kernel version: 6.0.10 Compositor: kwin 5.26.4 CPU: Intel Core i5-12500H GPU: Mesa Intel® Graphics (ADL GT2) iGPU + RTX 3050 Mobile
Still present in 525.60.11 😦