wasmtime: Access Violation in .NET application (with `wasmtime-dotnet`) when using current Wasmtime build
Hi,
we are using Wasmtime via wasmtime-dotnet in a .NET 6.0 application, where we load a 8 MB WASM file and execute its _start function. However, this causes an access violation on Windows in wasmtime_func_call when using Wasmtime from commit ff5abfd9938c78cd75c6c5d6a41565097128c5d3.
The crash started to happen toegher with issue #5328 (with commit fc62d4ad65892837a1e1af5a9d67c8b0d4310efe), which is why I also reported it there, but it could be that this was just a trigger and the actual root case is a different one. It could also be that once #5328 is fixed, the access violation is no longer reproducible.
I can reproduce the access violation with the following steps on Windows 10 x64 Build 19044, and on Windows Server 2019:
- Build the Wasmtime C API from commit ff5abfd9938c78cd75c6c5d6a41565097128c5d3 on Windows for the current host (
x86_64-pc-windows-msvc) as release build:cargo build --release --manifest-path crates/c-api/Cargo.toml - Install the .NET SDK 6.0.403 for Windows x64, which includes the .NET 6.0.11 runtime.
- Clone https://github.com/kpreisser/wasmtime-dotnet, and switch to branch
repro-wasmtime-access-violation, which contains a commit that adds the repro project on top of the currentwasmtime-dotnetcode. (The code contains a lot of fields/variables and some methods that are not used, but that was as best as I could get it while still reproducing the crash.) cdinto the repo’sReproWasmtimeAccessViolationfolder, then rundotnet build ReproWasmtimeAccessViolation.csproj.cd bin\Debug\net6.0, then copywasmtime.dllandwasmtime.pdbfrom the build result into this directory.- Download the WASM file as .Zip file here (note: rename the .mov file to .zip ; I renamed it to be able to upload it here): https://user-images.githubusercontent.com/13289184/203989968-4639d6d5-15c9-4e57-bf01-c2eccd756642.mov, and copy the file
dotnet-codabix-wasm-build.wasmalso into this directory. - Run
ReproWasmtimeAccessViolation.exe.
After some seconds, the program will exit if the access violation occured (and in Windows Error Reporting, there will be an error entry in the Application log (“Application Error”).
If it didn’t occur, the program will print "wasmtime_func_call completed! This should not be displayed if the Access Violation occured.", and you can use Ctrl+C to exit.
Unfortunately, the crash doesn’t alway occur, it might also depend on the hardware or other factors. You may have to try a few times if the crash doesn’t occur, or try again at a later time.
When attaching Visual Studio 2022 as debugger (using “native code” mode), it will show the following:
Exception thrown at 0x000002BB0392BD19 in ReproWasmtimeAccessViolation.exe: 0xC0000005: Access violation writing location 0x000000CA214E4E80.
Call Stack:
> 000002bb0392bd19() Unknown
000002bb0392b8c4() Unknown
000002bb0429bb8c() Unknown
000002bb03e5b2b2() Unknown
000002bb03e616c0() Unknown
000002bb03e5d4a3() Unknown
000002bb03a369dc() Unknown
000002bb03951aaf() Unknown
000002bb0392b8c4() Unknown
000002bb0429bb8c() Unknown
000002bb03e5b2b2() Unknown
000002bb03e616c0() Unknown
000002bb03e5d4a3() Unknown
000002bb038caa53() Unknown
000002bb03a2c701() Unknown
000002bb039f78bc() Unknown
000002bb0392b8c4() Unknown
000002bb0429bb8c() Unknown
000002bb03e5b2b2() Unknown
000002bb03e616c0() Unknown
000002bb03e5d4a3() Unknown
000002bb038caa53() Unknown
000002bb03a2c701() Unknown
000002bb039f78bc() Unknown
000002bb0392b8c4() Unknown
000002bb0429bb8c() Unknown
000002bb03e5b2b2() Unknown
000002bb03e5a965() Unknown
000002bb03ac8ee0() Unknown
000002bb03ac3058() Unknown
000002bb04294c00() Unknown
000002bb042df4c4() Unknown
000002bb03aaa51f() Unknown
000002bb0382174a() Unknown
000002bb038210d4() Unknown
000002bb044cf7bc() Unknown
000002bb044cfdbb() Unknown
wasmtime.dll!wasmtime_setjmp() C
wasmtime.dll!_ZN16wasmtime_runtime12traphandlers84_$LT$impl$u20$wasmtime_runtime..traphandlers..call_thread_state..CallThreadState$GT$4with17h74c010e2e721401eE() Unknown
wasmtime.dll!_ZN16wasmtime_runtime12traphandlers11catch_traps17h7bef31bead668a92E() Unknown
wasmtime.dll!_ZN8wasmtime4func4Func18call_unchecked_raw17hcb0b0b7982c6fc66E.llvm.11183561978299839151() Unknown
wasmtime.dll!_ZN8wasmtime4func4Func9call_impl17hcb50041b4478fa06E.llvm.11183561978299839151() Unknown
wasmtime.dll!wasmtime_func_call() Unknown
00007ffd5615ce96() Unknown
00007ffd5615cc68() Unknown
00007ffd5615c826() Unknown
00007ffd5615c66b() Unknown
00007ffd5615a56b() Unknown
00007ffd5615a0ac() Unknown
00007ffd55f3fd86() Unknown
00007ffd55f3ef8b() Unknown
System.Private.CoreLib.dll!00007ffd3609274f() Unknown
coreclr.dll!00007ffdb5aaaac3() Unknown
coreclr.dll!00007ffdb59a6d9c() Unknown
coreclr.dll!00007ffdb5a8c413() Unknown
coreclr.dll!00007ffdb59836f5() Unknown
coreclr.dll!00007ffdb59835fa() Unknown
coreclr.dll!00007ffdb5983419() Unknown
kernel32.dll!00007ffdeafe74b4() Unknown
ntdll.dll!00007ffdeb2c26a1() Unknown
Additional notes:
- On my system, the crash reproduces in about 50% of the cases.
- I can’t seem to reproduce the crash when using Wasmtime debug build.
Environments where I can reproduce the crash:
- Environment 1:
- Hardware: Apple Mac mini (2018), with Parallels Desktop to run the Windows VM
- CPU: Intel® Core™ i7-8700B CPU @ 3.20GHz 3.19 GHz
- VM (Parallels Desktop on macOS): Windows 10 Pro x64 Build 19044 (in a VM)
- Environment 2:
- Hardware: Intel NUC8i7BEH
- CPU: Intel Core i7-8559U
- VM (VMware Player 16 on Window 10 Build 19044): Windows Server 2019 Build 17763
Thank you!
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 24 (24 by maintainers)
Commits related to this issue
- wasmtime: enable stack probing for x86_64 targets. This commit unconditionally enables stack probing for x86_64 targets. On Windows, stack probing is always required because of the way Windows commi... — committed to peterhuene/wasmtime by peterhuene 2 years ago
- wasmtime: enable stack probing for x86_64 targets. (#5350) * wasmtime: enable stack probing for x86_64 targets. This commit unconditionally enables stack probing for x86_64 targets. On Windows,... — committed to bytecodealliance/wasmtime by peterhuene 2 years ago
Indeed! I’ve opened https://github.com/bytecodealliance/wasmtime/pull/5353 to add support for aarch64
Yes, I’ll create a patch to enable it unconditionally (for x86_64).
Not yet, no, and you can’t do that today due to the issue @jameysharp mentioned above. @peterhuene were you going to work on a patch to enable that?
Personally I think it’s reasonable to unconditionally enable this for all supported platforms since stack probes shouldn’t have much of a perf impact anyway. This is only implemented for x86_64 though and other platforms don’t have inline stack probes implemented yet and outlined probing support would need more changes in Wasmtime we may not yet be ready for.
Here’s a simplified wat that reproduces the problem. It has a function with a frame size of 80192 bytes and that consistently gets
wasmtime repro.watto crash:By eliding the host compatibility check in the engine and enabling stack probing:
I think we can confidently say that it’s due to a missing probe for these large frame allocations.
FYI, I believe function 311 in the module to be the faulting function as it has thousands of locals (lol:
wasm-tools printprints to column 150221 for the locals declaration)!Also, I will note that the default thread stack size for a 64-bit .NET core apphost is 1.5 MiB and at the fault there was still plenty of wasm stack space left (hence why it succeeded the prologue overflow check), it just didn’t probe to get Windows to commit more pages.
@bjorn3 thanks for the reminder as I’ve willfully purged my Windows knowledge these past few years!
This is indeed what is happening here; the frame is larger than the commit size in the PE header (4KiB), so it’s not triggering a fault on the guard page for Windows to commit more stack pages.
We should definitely be forcing a probe on Windows for any frame larger than 4 KiB at a minimum.
On Windows we need stackprobes anyways to grow the stack. As I understand it on Windows only a small part is committed by default and directly below it is a guard page which if accessed will cause the stack to grow a bit. If you access even a single page below the guard page you will get an AV even though it is still in the part of the memory reserved for the stack. cg_clif also had access violations on windows in some cases which it fixed by enabling inline stack probes. cc https://github.com/bjorn3/rustc_codegen_cranelift/issues/1261#issuecomment-1219307630
@kpreisser thanks for the report and the additional details. I’ll investigate this immediately and follow up when I have new information to share.