wgpu: Applications using `wgpu` hang forever on bleeding edge Linux with Nvidia drivers 545.29.06 on GNOME / Wayland

Repro steps Running anything which tries to use wgpu Vulkan, like:

cd examples && cargo run cube

The window starts and renders at least one frame, but becomes completely non-interactive (windows can’t be interacted with or moved) and you receive a “hanging” prompt from GNOME:

Note that I think this might legitimately be a platform issue, however:

I am unable to reproduce it with both vkcube (X11) and vkcube-wayland which reports and runs (see below).
And winit examples run without issues.

> sudo dnf install vulkan-tools
> vkcube-wayland
Selected GPU 0: NVIDIA GeForce RTX 2080 Ti, type: DiscreteGpu

Screencast from 2023-11-25 18-23-21.webm

So wgpu is currently the lowest level of abstraction I’ve chased down.

Platform

Log output from running the example:

wgpu_core::instance] Adapter Vulkan AdapterInfo { name: "NVIDIA GeForce RTX 2080 Ti", vendor: 4318, device: 7687, device_type: DiscreteGpu, driver: "NVIDIA", driver_info: "545.29.06", backend: Vulkan }

uname -r:

6.7.0-0.rc2.20231122gitc2d5304e6c64.23.fc40.x86_64

About this issue

Original URL
State: closed
Created 7 months ago
Reactions: 1
Comments: 15 (10 by maintainers)

Most upvoted comments

I reported this issue direct to an nvidia linux driver dev

ryzendew on Jan 28, 2024

for the record: Nvidia bug report by @RyzenDew https://forums.developer.nvidia.com/t/wgpu-driver-bug/280420

zocker-160 on Feb 9, 2024

@ids1024 That the scenario cited is about what should happen during a device loss, which is something different from what happens here.

I don’t know this for sure, but my current understanding is that the spec doesn’t guarantee when the fence should be signaled, because the presentation engine might opt to hold onto the swapchain image for as long as it wants to. Which here seem to be up until a new frame is being submitted or presented. Android apparently does something like that so that it can use the swapchain image for things between render calls.

At least that is my conclusion from a careful read of the spec regarding the relevant functions. That doesn’t mean Nvidia might not still be interested in fixing it. That being said, the vast majority of applications do what I’ve proposed in #4967 so we probably just want to do that as well to avoid problems.

udoprog on Jan 3, 2024