wgpu: Dynamic dispatch layer breaks on web

Description The web backend currently has an error like this when running the hello-triangle example using cargo run-wasm --example=hello-triangle in the browser console output:

panicked at 'assertion failed: wasm.is_instance_of::<T>()', wgpu/src/backend/web.rs:53:9

Stack:

getImports/imports.wbg.__wbg_new_abda76e883ba8a5f@http://localhost:8000/hello-triangle.js:340:21
console_error_panic_hook::Error::new::h09e78aee7fd991ae@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[4182]:0xf15ec
console_error_panic_hook::hook_impl::h81c8bf5494034336@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[717]:0x66f7d
console_error_panic_hook::hook::hc9dd7d1d176aa725@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[4719]:0xf8b1e
core::ops::function::Fn::call::hc3d05c66054772b6@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[3865]:0xec751
std::panicking::rust_panic_with_hook::hb42d415afcc11f2f@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[1721]:0xaeb05
std::panicking::begin_panic_handler::{{closure}}::h57dd84c078404aa2@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[2168]:0xc3482
std::sys_common::backtrace::__rust_end_short_backtrace::hef94dcc71d3b37f5@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[5052]:0xfcef3
rust_begin_unwind@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[3238]:0xe0f9f
core::panicking::panic_fmt::h32de9c76c9d5eb0c@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[4581]:0xf6f1d
core::panicking::panic::hff08c7cf90edd9fe@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[3862]:0xec692
<wgpu::backend::web::Identified<T> as core::convert::From<wgpu::context::ObjectId>>::from::h09e99484006351c1@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[1664]:0xaba02
<T as wgpu::context::DynContext>::adapter_request_device::he51a22a86469d596@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[668]:0x61f45
wgpu::Adapter::request_device::hdaaa471e221d3c58@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[1073]:0x856fc
hello_triangle::run::{{closure}}::h19b0046d2265909e@http://localhost:8000/hello-triangle_bg.wasm:wasm-function[255]:0xb1d7

... Rest of call stack trimmed for brevity

Now this is not an error that is caused by mismatching types. By inserting some debugging aid log statements we see the following console output before the panic:

(0) Upcast(id = 74): wgpu::backend::web::Identified<web_sys::features::gen_GpuCanvasContext::GpuCanvasContext> [hello-triangle.js:489:17](http://localhost:8000/hello-triangle.js)
(1) Upcast(id = 72): wgpu::backend::web::Identified<web_sys::features::gen_GpuAdapter::GpuAdapter> [hello-triangle.js:489:17](http://localhost:8000/hello-triangle.js)
(2) Downcasting(id = 72): wgpu::backend::web::Identified<web_sys::features::gen_GpuAdapter::GpuAdapter> [hello-triangle.js:489:17](http://localhost:8000/hello-triangle.js)
(3) Downcasting(id = 72): wgpu::backend::web::Identified<web_sys::features::gen_GpuAdapter::GpuAdapter> [hello-triangle.js:489:17](http://localhost:8000/hello-triangle.js)
panicked at 'assertion failed: wasm.is_instance_of::<T>()', wgpu/src/backend/web.rs:61:9
...

What we see here is that the failure occurs on the second downcast of the GpuAdapter. The reason this occurs is because of how the web backend currently converts an ObjectId into whatever associated type is desired. In particular the web backend will use the ObjectId and convert the raw number right into whatever wasm_bindgen type was desired (such as GpuAdapter). Currently the web backend uses the IntoWasmAbi and FromWasmAbi traits to do this. However there is a less known side effect of this. FromWasmAbi and IntoWasmAbi transfer ownership of the JS object. When dropped, it will be destroyed on the JS side. Further use of the object when downcast again will of course fail since the actual object on the JS side was destroyed. It just happens to be caught easily by the is_instance_of check with strict asserts.

When running in release mode (to disable strict asserts) a different error shows up:

Uncaught (in promise) TypeError: getObject(...).requestDevice is not a function
    __wbg_requestDevice_7dbb93ae9dc5edc1 http://localhost:8000/hello-triangle.js:500
    __wbg_adapter_49 http://localhost:8000/hello-triangle.js:229
    real http://localhost:8000/hello-triangle.js:202
    promise callback*getImports/imports.wbg.__wbg_then_11f7a54d67b4bfad http://localhost:8000/hello-triangle.js:1128
    __wbg_adapter_49 http://localhost:8000/hello-triangle.js:229
    real http://localhost:8000/hello-triangle.js:202
    promise callback*getImports/imports.wbg.__wbg_then_cedad20fbbd9418a http://localhost:8000/hello-triangle.js:1132
    __wbg_adapter_49 http://localhost:8000/hello-triangle.js:229
    real http://localhost:8000/hello-triangle.js:202
    promise callback*getImports/imports.wbg.__wbg_then_11f7a54d67b4bfad http://localhost:8000/hello-triangle.js:1128
    finalizeInit http://localhost:8000/hello-triangle.js:1240
    init http://localhost:8000/hello-triangle.js:1272
    async* http://localhost:8000/:17
    EventListener.handleEvent* http://localhost:8000/:16

The solution to fix this would be to avoid taking ownership of the object when downcasting. This is easier said than done however. You need to make sure you destroy the object on drop to prevent memory leaks. The casting can be done, but we can only receive a ManuallyOwned<T>. And of course trying to return a reference to an object defined in a function is not allowed. This means there are one of two solutions left:

  1. Replace ObjectId with a Box<dyn Whatever>

This would be an easy solution, but will cause a definite increase in memory allocations. It might not be a huge concern on either web or native as the largest allocation size would essentially be a pointer or two. Profiling would make sense here, but of course best to avoid an allocation heavy pattern if possible in the first place.

  1. The web backend should store the underling Gpu* objects on the XXXXData associated types.

This does mean the ObjectId on web will become effectively inert (unless the expose-ids) feature is enabled. This will also automatically deal with the concerns around memory leaks by downcasting.

This solution however calls into question whether the dynamic dispatch layer should even have separate Data and Id types (consolidating instead on a Data type only. But the currently native implementation uses some Data associated types and future out of tree backends could use both.

Other notes

I tested this on firefox nightly (webgpu does not work on Linux at all on Chrome dev, no canary). The same error does appear in chrome dev even though the webgpy implementation on chrome is broken.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 3
  • Comments: 28 (13 by maintainers)

Commits related to this issue

Most upvoted comments

I wasn’t working on it as I thought @i509VCB was working on it - I can prioritize it as it seems to be blocking.

I have a PR that should fix this at #3657 - would be great if some larger projects could test against my branch to see if I missed anything. I found it really easy to make mistakes doing the downcasting (would be great to improve this somehow), so it’s possible there are some other paths that I didn’t hit while testing.

image

So there’s a way to work around this massive (UAF-shaped) unsoundness issue. But it’s not a good way to do it. Basically you just comment out the dropObject function body in the resulting .js file:

function dropObject(idx) {
    // if (idx < 132) return;
    // heap[idx] = heap_next;
    // heap_next = idx;
}

No more free, no more problems (okay you’re now leaking, which could be a problem for any transient objects).

After about a minute or two, it doesn’t look too bad though (this starts out around 200 entries): image

The risk here I suppose would be large objects being kept around, not necessarily many objects.


However, we can limit this to just WebGPU objects (i.e. the ones wgpu::backend::web is constantly freeing repeatedly due to ObjectId being Copy etc.):

const webgpu_classes = Object.getOwnPropertyNames(window)
    .filter(k => k.startsWith("GPU") && typeof window[k] === 'function')
    .map(k => window[k]);
const is_webgpu_obj = o => webgpu_classes.some(webgpu_class => o instanceof webgpu_class);
function dropObject(idx) {
    if(idx < 132) return;
    if(is_webgpu_obj(heap[idx])) return;
    heap[idx] = heap_next;
    heap_next = idx;
}

After triggering significantly more redraws than with the simpler method, I get: image

And while Chrome itself using 1.6 GiB, the JS heap size is shown as 3 MiB, with less than 1 MiB leaked by that heap array (based on the contents, if I had to guess it’s mostly GPURenderPassEncoder).

I’m also getting some lag-spikes but I think those are from XWayland (as I couldn’t get WebGPU working natively).

Speaking of, this works with stable Chrome (Chrome/111.0.5563.64):

WAYLAND_DISPLAY= google-chrome-stable --enable-features=Vulkan --enable-unsafe-webgpu http://localhost:8000

And just as well with google-chrome-unstable (Chrome/113.0.5638.0 - that’s the version I did most experimenting with, since I wasn’t sure if the issue was older Chrome, but nope, both work just as well).

So the Context trait defines a bunch of assoicated types. My idea was to no longer downcast from ObjectId (making it just a global id) and store the GPUTexture for example inside the TextureData assoicated type. The same with every other type that holds a web_sys::GPU type

~I started working on this in https://github.com/grovesNL/wgpu/tree/dynamic-dispatch-workaround but it’s not clear how we could make this work without bigger changes to how ObjectId is used. I might be doing something wrong and misunderstood the approach you suggested though.~

~For example, Buffer is defined as~

pub struct Buffer {
    id: ObjectId,
    /* ... snip ... */
}

~A lot of functions just work with e.g., buffer.id, so we’d ideally have a way to get back toContext::BufferData from the ObjectId. I don’t see a great way we could do that without going through the wasm ABI conversions again. In my branch you can remove the impl<T: FromWasmAbi<Abi = u32> + JsCast> From<ObjectId> for Sendable<T> to see most of the places I’m not sure about.~

We also have some places accepting iterators like fn queue_submit<I: Iterator<Item = Self::CommandBufferId>> where we’d like to access Self::CommandBufferData instead. I’m not sure how we can deal with those either.

Edit: I think I’m past the problems with types like buffer now - still not sure about iterators

I’d like to change signatures like fn queue_submit<I: Iterator<Item = Self::CommandBufferId>> to include data with something like fn queue_submit<I: Iterator<Item = (Self::CommandBufferId, Self::CommandBufferData)>> but not sure how to make this work because of the type erasure.

Edit2: Figured out how to get past the iterator problem I was running into. Still working on filling in some other places now, but I have hello-triangle working again image

I’ll keep updating https://github.com/grovesNL/wgpu/tree/dynamic-dispatch-workaround as I work through it and open a PR once it’s in slightly better shape.

How is the progress in solving this, as it seems to completely break wgpu 0.15 on the web for me and others?

Is there any suggested fix or workaround, or what is the plan for solving this issue?

I think the issue needs to be reopened or am I wrong?

I will try to reduce the code to get a better understanding about the problem. I thought the same about the offscreen canvas part, but the integration test with wasm-pack is a pure canvas, because we can’t test in worker for now.

https://github.com/codeart1st/wgpu-layers/blob/main/tests/utils.rs#L33

More of the stack:

console.<computed> | @ | (index):16
-- | -- | --
  | (anonymous) | @ | wasm-bindgen-test:766
  | logError | @ | wasm-bindgen-test:214
  | imports.wbg.__wbg_error_f851667af71bcfc6 | @ | wasm-bindgen-test:764
  | $console_error_panic_hook::error::haa6344a57d53fd1a | @ | wasm-bindgen-test_bg.wasm:0x2fdebb
  | $console_error_panic_hook::hook_impl::hf098363f36624c01 | @ | wasm-bindgen-test_bg.wasm:0x1ec484
  | $console_error_panic_hook::hook::habf82d241e6461f3 | @ | wasm-bindgen-test_bg.wasm:0x410ec6
  | $core::ops::function::Fn::call::h2a5bfecfee992b11 | @ | wasm-bindgen-test_bg.wasm:0x3eb036
  | $<alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h3d3e4e50cd87f54b | @ | wasm-bindgen-test_bg.wasm:0x39edc3
  | $std::panicking::rust_panic_with_hook::h4e15c82b021ecc52 | @ | wasm-bindgen-test_bg.wasm:0x83edc
  | $std::panicking::begin_panic_handler::{{closure}}::h6314a820b4606399 | @ | wasm-bindgen-test_bg.wasm:0x2028f8
  | $std::sys_common::backtrace::__rust_end_short_backtrace::h902fac193e8faf01 | @ | wasm-bindgen-test_bg.wasm:0x36a345
  | $rust_begin_unwind | @ | wasm-bindgen-test_bg.wasm:0x34b3e0
  | $core::panicking::panic_fmt::h04349fcbd66ae3be | @ | wasm-bindgen-test_bg.wasm:0x36422b
  | $core::panicking::panic::h88c1f21fc10f4ca5 | @ | wasm-bindgen-test_bg.wasm:0x338204
  | $wgpu::context::downcast_ref::h42cc43d9f477ee06 | @ | wasm-bindgen-test_bg.wasm:0x34e349
  | $<T as wgpu::context::DynContext>::surface_get_capabilities::h627c650a19c22ea8 | @ | wasm-bindgen-test_bg.wasm:0x2dbfaa
  | $wgpu::Surface::get_capabilities::h1c786be24397c360 | @ | wasm-bindgen-test_bg.wasm:0x2bc9c9
  | $wgpu_layers::renderer::Renderer::new::{{closure}}::h508a1e824ff88dba | @ | wasm-bindgen-test_bg.wasm:0x375f5
  | $wgpu_layers::init::{{closure}}::h5b03158c392bbbee | @ | wasm-bindgen-test_bg.wasm:0x16a76d
  | $wgpu_layers::wasm::start_with_canvas::{{closure}}::h0b346edc917e180c | @ | wasm-bindgen-test_bg.wasm:0x13ff8b
  | $render_test::empty::{{closure}}::hc1b9759c6c9a0c78 | @ | wasm-bindgen-test_bg.wasm:0xd9422
  | $wasm_bindgen_test::__rt::Context::execute_async::{{closure}}::hbda9f82ade52a0f1 | @ | wasm-bindgen-test_bg.wasm:0x1bb160
  | $<wasm_bindgen_test::__rt::TestFuture<F> as core::future::future::Future>::poll::{{closure}}::{{closure}}::h79a9c56f52d3866a | @ | wasm-bindgen-test_bg.wasm:0x2eb8c4
  | $wasm_bindgen::convert::closures::invoke0_mut::ha7ed5b3e74ea5db6 | @ | wasm-bindgen-test_bg.wasm:0x30804d
  | __wbg_adapter_63 | @ | wasm-bindgen-test:454
  | cb0 | @ | wasm-bindgen-test:712
  | window.__wbg_test_invoke | @ | (index):26
  | (anonymous) | @ | wasm-bindgen-test:717
  | handleError | @ | wasm-bindgen-test:459
  | imports.wbg.__wbg_wbgtestinvoke_00cf1ac27b273033 | @ | wasm-bindgen-test:705
  | $wasm_bindgen_test::__rt::__wbg_test_invoke::h5503f87590057aa5 | @ | wasm-bindgen-test_bg.wasm:0x247ecf
  | $<wasm_bindgen_test::__rt::TestFuture<F> as core::future::future::Future>::poll::{{closure}}::ha5d162b17e4aac7a | @ | wasm-bindgen-test_bg.wasm:0x30d888
  | $scoped_tls::ScopedKey<T>::set::h03b4b7a9ba002906 | @ | wasm-bindgen-test_bg.wasm:0x29c2fd
  | $<wasm_bindgen_test::__rt::TestFuture<F> as core::future::future::Future>::poll::h910a1737cc6c3dca | @ | wasm-bindgen-test_bg.wasm:0x109362
  | $<wasm_bindgen_test::__rt::ExecuteTests as core::future::future::Future>::poll::hb676a32c2fa3065f | @ | wasm-bindgen-test_bg.wasm:0x8b767
  | $wasm_bindgen_test::__rt::Context::run::{{closure}}::he6d38dd0cefc076d | @ | wasm-bindgen-test_bg.wasm:0x1b285a
  | $wasm_bindgen_futures::future_to_promise::{{closure}}::{{closure}}::he0566893a7827a9d | @ | wasm-bindgen-test_bg.wasm:0x11d2aa
  | $wasm_bindgen_futures::task::multithread::Task::run::h93fbcdb073ef8b48 | @ | wasm-bindgen-test_bg.wasm:0x1414f0
  | $wasm_bindgen_futures::task::multithread::Task::spawn::{{closure}}::h2534fabdb3bf56f6 | @ | wasm-bindgen-test_bg.wasm:0x348ae0
  | $<dyn core::ops::function::FnMut<(A,)>+Output = R as wasm_bindgen::closure::WasmClosure>::describe::invoke::hf0acd6cac6bfa8a6 | @ | wasm-bindgen-test_bg.wasm:0x2ca927
  | __wbg_adapter_39 | @ | wasm-bindgen-test:276
  | real | @ | wasm-bindgen-test:253
  | Promise.then (async) |   |  
  | (anonymous) | @ | wasm-bindgen-test:1511
  | logError | @ | wasm-bindgen-test:214
  | imports.wbg.__wbg_then_ec5db6d509eb475f | @ | wasm-bindgen-test:1510
  | $js_sys::Promise::then::h0b2c7f0db6c57706 | @ | wasm-bindgen-test_bg.wasm:0x3730c7
  | $wasm_bindgen_futures::task::multithread::Task::run::h93fbcdb073ef8b48 | @ | wasm-bindgen-test_bg.wasm:0x141666
  | $wasm_bindgen_futures::task::multithread::Task::spawn::{{closure}}::h2534fabdb3bf56f6 | @ | wasm-bindgen-test_bg.wasm:0x348ae0
  | $<dyn core::ops::function::FnMut<(A,)>+Output = R as wasm_bindgen::closure::WasmClosure>::describe::invoke::hf0acd6cac6bfa8a6 | @ | wasm-bindgen-test_bg.wasm:0x2ca927
  | __wbg_adapter_39 | @ | wasm-bindgen-test:276
  | real | @ | wasm-bindgen-test:253
  | Promise.then (async) |   |  
  | (anonymous) | @ | wasm-bindgen-test:1511
  | logError | @ | wasm-bindgen-test:214
  | imports.wbg.__wbg_then_ec5db6d509eb475f | @ | wasm-bindgen-test:1510
  | $js_sys::Promise::then::h0b2c7f0db6c57706 | @ | wasm-bindgen-test_bg.wasm:0x3730c7
  | $wasm_bindgen_futures::task::multithread::Task::run::h93fbcdb073ef8b48 | @ | wasm-bindgen-test_bg.wasm:0x141666
  | $wasm_bindgen_futures::queue::QueueState::run_all::h958c10b773927cf1 | @ | wasm-bindgen-test_bg.wasm:0x17c4d2
  | $wasm_bindgen_futures::queue::Queue::new::{{closure}}::h7eaa64eab4ae83b0 | @ | wasm-bindgen-test_bg.wasm:0x348a62
  | $<dyn core::ops::function::FnMut<(A,)>+Output = R as wasm_bindgen::closure::WasmClosure>::describe::invoke::hf0acd6cac6bfa8a6 | @ | wasm-bindgen-test_bg.wasm:0x2ca927
  | __wbg_adapter_39 | @ | wasm-bindgen-test:276
  | real | @ | wasm-bindgen-test:253
  | Promise.then (async) |   |  
  | (anonymous) | @ | wasm-bindgen-test:1511
  | logError | @ | wasm-bindgen-test:214
  | imports.wbg.__wbg_then_ec5db6d509eb475f | @ | wasm-bindgen-test:1510
  | $js_sys::Promise::then::h0b2c7f0db6c57706 | @ | wasm-bindgen-test_bg.wasm:0x3730c7
  | $wasm_bindgen_futures::queue::Queue::schedule_task::h51ed02609d74f0d6 | @ | wasm-bindgen-test_bg.wasm:0x1f89e8
  | $wasm_bindgen_futures::task::multithread::Task::spawn::{{closure}}::h2f65c6375e41c6e5 | @ | wasm-bindgen-test_bg.wasm:0x3f92c1
  | $std::thread::local::LocalKey<T>::try_with::hbb0d1b9f9f4424a1 | @ | wasm-bindgen-test_bg.wasm:0x27b4f5
  | $std::thread::local::LocalKey<T>::with::h4d7fc6784dfd3778 | @ | wasm-bindgen-test_bg.wasm:0x37e906
  | $wasm_bindgen_futures::task::multithread::Task::spawn::h08edc717377ad9f1 | @ | wasm-bindgen-test_bg.wasm:0x12fa02
  | $wasm_bindgen_futures::spawn_local::h23c7a18a7df5b49c | @ | wasm-bindgen-test_bg.wasm:0x1c73f7
  | $wasm_bindgen_futures::future_to_promise::{{closure}}::h715c0534ead3bd7e | @ | wasm-bindgen-test_bg.wasm:0x2422e8
  | $wasm_bindgen::convert::closures::invoke2_mut::h163672718a4a560f | @ | wasm-bindgen-test_bg.wasm:0x2b07e3
  | __wbg_adapter_413 | @ | wasm-bindgen-test:478
  | cb0 | @ | wasm-bindgen-test:1495
  | (anonymous) | @ | wasm-bindgen-test:1500
  | logError | @ | wasm-bindgen-test:214
  | imports.wbg.__wbg_new_9d3a9ce4282a18a8 | @ | wasm-bindgen-test:1488
  | $js_sys::Promise::new::hfe2ac748ec399f05 | @ | wasm-bindgen-test_bg.wasm:0x3602f9
  | $wasm_bindgen_futures::future_to_promise::h142703004395576f | @ | wasm-bindgen-test_bg.wasm:0x2c2c13
  | $wasm_bindgen_test::__rt::Context::run::h0b07b3ab22dfcbee | @ | wasm-bindgen-test_bg.wasm:0xddd1a
  | $wasmbindgentestcontext_run | @ | wasm-bindgen-test_bg.wasm:0x247e30
  | run | @ | wasm-bindgen-test:569
  | main | @ | run.js:33
  | await in main (async) |   |  
  | (anonymous) | @ | run.js:39


If i don’t rely on get_capabilities i get the same error for calling configure on my swapchain surface.

I have a PR that should fix this at #3657 - would be great if some larger projects could test against my branch to see if I missed anything. I found it really easy to make mistakes doing the downcasting (would be great to improve this somehow), so it’s possible there are some other paths that I didn’t hit while testing.

I can confirm that this works with maplibre-rs

@grovesNL I just tested https://compute.toys/ against your branch and can confirm it resolves this issue (and didn’t notice any breakages)

Would it be possibile to keep the specific type inside the Identified and use the RefFromwasmAbi in order to not drop it in some way?

It would be possible via having the dynamic dispatch layer downcast IDs into a ManuallyDrop and then get a reference from that. The other solution I have in mind is to use the Data assoicated types to store the JS objects and leave the id to just being the wasm repr (without taking out the object) + global id.

I was digging a bit into it yesterday, starting from what is happening on Surface, in order to see if I can implement this. The idea then would be to have the data storing directly the object itself but it’s not super clear to me how you would use the ObjectId and how do you imagine to do the conversions between the two (or maybe I didn’t understand completely what is your idea 😅)

So the Context trait defines a bunch of assoicated types. My idea was to no longer downcast from ObjectId (making it just a global id) and store the GPUTexture for example inside the TextureData assoicated type. The same with every other type that holds a web_sys::GPU type