wasmtime: Access Violation in .NET application (with `wasmtime-dotnet`) when using current Wasmtime build

Hi,

we are using Wasmtime via wasmtime-dotnet in a .NET 6.0 application, where we load a 8 MB WASM file and execute its _start function. However, this causes an access violation on Windows in wasmtime_func_call when using Wasmtime from commit ff5abfd9938c78cd75c6c5d6a41565097128c5d3.

The crash started to happen toegher with issue #5328 (with commit fc62d4ad65892837a1e1af5a9d67c8b0d4310efe), which is why I also reported it there, but it could be that this was just a trigger and the actual root case is a different one. It could also be that once #5328 is fixed, the access violation is no longer reproducible.

I can reproduce the access violation with the following steps on Windows 10 x64 Build 19044, and on Windows Server 2019:

  • Build the Wasmtime C API from commit ff5abfd9938c78cd75c6c5d6a41565097128c5d3 on Windows for the current host (x86_64-pc-windows-msvc) as release build: cargo build --release --manifest-path crates/c-api/Cargo.toml
  • Install the .NET SDK 6.0.403 for Windows x64, which includes the .NET 6.0.11 runtime.
  • Clone https://github.com/kpreisser/wasmtime-dotnet, and switch to branch repro-wasmtime-access-violation, which contains a commit that adds the repro project on top of the current wasmtime-dotnet code. (The code contains a lot of fields/variables and some methods that are not used, but that was as best as I could get it while still reproducing the crash.)
  • cd into the repo’s ReproWasmtimeAccessViolation folder, then run dotnet build ReproWasmtimeAccessViolation.csproj.
  • cd bin\Debug\net6.0, then copy wasmtime.dll and wasmtime.pdb from the build result into this directory.
  • Download the WASM file as .Zip file here (note: rename the .mov file to .zip ; I renamed it to be able to upload it here): https://user-images.githubusercontent.com/13289184/203989968-4639d6d5-15c9-4e57-bf01-c2eccd756642.mov, and copy the file dotnet-codabix-wasm-build.wasm also into this directory.
  • Run ReproWasmtimeAccessViolation.exe.

After some seconds, the program will exit if the access violation occured (and in Windows Error Reporting, there will be an error entry in the Application log (“Application Error”). If it didn’t occur, the program will print "wasmtime_func_call completed! This should not be displayed if the Access Violation occured.", and you can use Ctrl+C to exit.

Unfortunately, the crash doesn’t alway occur, it might also depend on the hardware or other factors. You may have to try a few times if the crash doesn’t occur, or try again at a later time.

When attaching Visual Studio 2022 as debugger (using “native code” mode), it will show the following:

Exception thrown at 0x000002BB0392BD19 in ReproWasmtimeAccessViolation.exe: 0xC0000005: Access violation writing location 0x000000CA214E4E80.

Call Stack:

>	000002bb0392bd19()	Unknown
 	000002bb0392b8c4()	Unknown
 	000002bb0429bb8c()	Unknown
 	000002bb03e5b2b2()	Unknown
 	000002bb03e616c0()	Unknown
 	000002bb03e5d4a3()	Unknown
 	000002bb03a369dc()	Unknown
 	000002bb03951aaf()	Unknown
 	000002bb0392b8c4()	Unknown
 	000002bb0429bb8c()	Unknown
 	000002bb03e5b2b2()	Unknown
 	000002bb03e616c0()	Unknown
 	000002bb03e5d4a3()	Unknown
 	000002bb038caa53()	Unknown
 	000002bb03a2c701()	Unknown
 	000002bb039f78bc()	Unknown
 	000002bb0392b8c4()	Unknown
 	000002bb0429bb8c()	Unknown
 	000002bb03e5b2b2()	Unknown
 	000002bb03e616c0()	Unknown
 	000002bb03e5d4a3()	Unknown
 	000002bb038caa53()	Unknown
 	000002bb03a2c701()	Unknown
 	000002bb039f78bc()	Unknown
 	000002bb0392b8c4()	Unknown
 	000002bb0429bb8c()	Unknown
 	000002bb03e5b2b2()	Unknown
 	000002bb03e5a965()	Unknown
 	000002bb03ac8ee0()	Unknown
 	000002bb03ac3058()	Unknown
 	000002bb04294c00()	Unknown
 	000002bb042df4c4()	Unknown
 	000002bb03aaa51f()	Unknown
 	000002bb0382174a()	Unknown
 	000002bb038210d4()	Unknown
 	000002bb044cf7bc()	Unknown
 	000002bb044cfdbb()	Unknown
 	wasmtime.dll!wasmtime_setjmp()	C
 	wasmtime.dll!_ZN16wasmtime_runtime12traphandlers84_$LT$impl$u20$wasmtime_runtime..traphandlers..call_thread_state..CallThreadState$GT$4with17h74c010e2e721401eE()	Unknown
 	wasmtime.dll!_ZN16wasmtime_runtime12traphandlers11catch_traps17h7bef31bead668a92E()	Unknown
 	wasmtime.dll!_ZN8wasmtime4func4Func18call_unchecked_raw17hcb0b0b7982c6fc66E.llvm.11183561978299839151()	Unknown
 	wasmtime.dll!_ZN8wasmtime4func4Func9call_impl17hcb50041b4478fa06E.llvm.11183561978299839151()	Unknown
 	wasmtime.dll!wasmtime_func_call()	Unknown
 	00007ffd5615ce96()	Unknown
 	00007ffd5615cc68()	Unknown
 	00007ffd5615c826()	Unknown
 	00007ffd5615c66b()	Unknown
 	00007ffd5615a56b()	Unknown
 	00007ffd5615a0ac()	Unknown
 	00007ffd55f3fd86()	Unknown
 	00007ffd55f3ef8b()	Unknown
 	System.Private.CoreLib.dll!00007ffd3609274f()	Unknown
 	coreclr.dll!00007ffdb5aaaac3()	Unknown
 	coreclr.dll!00007ffdb59a6d9c()	Unknown
 	coreclr.dll!00007ffdb5a8c413()	Unknown
 	coreclr.dll!00007ffdb59836f5()	Unknown
 	coreclr.dll!00007ffdb59835fa()	Unknown
 	coreclr.dll!00007ffdb5983419()	Unknown
 	kernel32.dll!00007ffdeafe74b4()	Unknown
 	ntdll.dll!00007ffdeb2c26a1()	Unknown

Additional notes:

  • On my system, the crash reproduces in about 50% of the cases.
  • I can’t seem to reproduce the crash when using Wasmtime debug build.

Environments where I can reproduce the crash:

  • Environment 1:
    • Hardware: Apple Mac mini (2018), with Parallels Desktop to run the Windows VM
    • CPU: Intel® Core™ i7-8700B CPU @ 3.20GHz 3.19 GHz
    • VM (Parallels Desktop on macOS): Windows 10 Pro x64 Build 19044 (in a VM)
  • Environment 2:
    • Hardware: Intel NUC8i7BEH
    • CPU: Intel Core i7-8559U
    • VM (VMware Player 16 on Window 10 Build 19044): Windows Server 2019 Build 17763

Thank you!

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 24 (24 by maintainers)

Commits related to this issue

Most upvoted comments

Indeed! I’ve opened https://github.com/bytecodealliance/wasmtime/pull/5353 to add support for aarch64

Yes, I’ll create a patch to enable it unconditionally (for x86_64).

Not yet, no, and you can’t do that today due to the issue @jameysharp mentioned above. @peterhuene were you going to work on a patch to enable that?

Personally I think it’s reasonable to unconditionally enable this for all supported platforms since stack probes shouldn’t have much of a perf impact anyway. This is only implemented for x86_64 though and other platforms don’t have inline stack probes implemented yet and outlined probing support would need more changes in Wasmtime we may not yet be ready for.

Here’s a simplified wat that reproduces the problem. It has a function with a frame size of 80192 bytes and that consistently gets wasmtime repro.wat to crash:

C:\Users\User\src\wasmtime>target\release\wasmtime.exe repro.wat

C:\Users\User\src\wasmtime>echo %ERRORLEVEL%
-1073741819

By eliding the host compatibility check in the engine and enabling stack probing:

C:\Users\User\src\wasmtime>target\release\wasmtime.exe --cranelift-enable enable_probestack --cranelift-set probestack_strategy=inline repro.wat
warning: using `--invoke` with a function that returns values is experimental and may break in the future
0

I think we can confidently say that it’s due to a missing probe for these large frame allocations.

FYI, I believe function 311 in the module to be the faulting function as it has thousands of locals (lol: wasm-tools print prints to column 150221 for the locals declaration)!

Also, I will note that the default thread stack size for a 64-bit .NET core apphost is 1.5 MiB and at the fault there was still plenty of wasm stack space left (hence why it succeeded the prologue overflow check), it just didn’t probe to get Windows to commit more pages.

@bjorn3 thanks for the reminder as I’ve willfully purged my Windows knowledge these past few years!

This is indeed what is happening here; the frame is larger than the commit size in the PE header (4KiB), so it’s not triggering a fault on the guard page for Windows to commit more stack pages.

We should definitely be forcing a probe on Windows for any frame larger than 4 KiB at a minimum.

On Windows we need stackprobes anyways to grow the stack. As I understand it on Windows only a small part is committed by default and directly below it is a guard page which if accessed will cause the stack to grow a bit. If you access even a single page below the guard page you will get an AV even though it is still in the part of the memory reserved for the stack. cg_clif also had access violations on windows in some cases which it fixed by enabling inline stack probes. cc https://github.com/bjorn3/rustc_codegen_cranelift/issues/1261#issuecomment-1219307630

@kpreisser thanks for the report and the additional details. I’ll investigate this immediately and follow up when I have new information to share.