runtime: Memory corruption (EEE, AV, etc.) with GCStress and blank WF or WPF app

Description

While working on my app (Paint.NET) for the past few months, I’ve been experiencing a rare, random EEE or AV in the debugger. Every few weeks I’ll just get interrupted with one of these and have not been able to debug it far enough. Delving into windbg shows some kind of corruption in some internal .NET thing, like an internal .NET method trying to call a method that it’s literally not written to call, indicating that some method pointer is now pointing at the wrong method (but still pointing at an actual method!). (That’s just one example of what I’ve seen – it’s not consistent.)

Eventually I was able to get it to semi-reliably repro if I forced GC to happen very frequently. I have a keyboard shortcut set up for this, Ctrl + Alt + Shift + ` (tilde key). If I hold it down while doing “various” things, my app will usually crash within a minute or so.

So I started running my app with DOTNET_GCStress=[various] and I literally can’t get my app to survive the startup code path and show the main window. It always crashes in all sorts of weird places, sometimes as bad as just saying “unknown module” with no call stack on the main thread. There are a few places where it crashes more consistently, like in resource loading code (e.g. new ResourceSet(stream)) but it’s still effectively random. Sometimes it doesn’t crash with an EEE/AV, but I get something like IndexOutOfRangeException while building a HashSet<int> (no multithreaded access), or an assert will fire on Debug.Assert(path == path.Trim()) but path has no whitespace in it.

The finalizer thread is often, but not always, parked on System.GC.GetGCMemoryInfo(System.GCKind) but that may just be coincidence. Another common one for the finalizer thread is in a Gen2GCCallback related to TlsOverPerCoreSomething, maybe related to the array pool.

There have also been some instances where it crashed before my Main() method was even invoked. This is probably the most disturbing one that I’ve seen, and it’s happened multiple times while I’ve been investigating this.

In an attempt to whittle things down to a minimum repro, I came across two of them: If I create a blank WinForms or WPF app using the stock, built-in templates, I can easily cause a crash of the same kind, either an EEE or an AV.

For the WinForms app, it does start up and then it eventually dies while I’m resizing the window (which is pretty much the only possible interaction with the app). The WPF app doesn’t get that far.

So I’m pretty sure there’s at least 1 bug somewhere in .NET, possibly in the GC.

I’ve also tried running my app with gcflags /page /enable paintdotnet.exe but it did not raise any errors.

I can provide dumps or other artifacts, but the two minimal repros will hopefully be enough to run with.

Reproduction Steps

Two repros so far:

WinForms

  1. Create a new WinForms project in Visual Studio 2022 (latest or preview, doesn’t seem to matter)
  2. In the debug properties, add DOTNET_GCStress = 3
  3. Start the app with Debug -> Start Debugging
  4. Once the main window shows, grab a corner of the window and start resizing it. Don’t let up on the mouse button until it crashes, which should happen within 10 seconds.

WPF

  1. Create a new WPF project in Visual Studio 2022 (latest or preview, doesn’t seem to matter)
  2. In the debug properties, add DOTNET_GCStress = 3
  3. Start the app with Debug -> Start Debugging
  4. It will crash before the main window is shown

You could probably install the latest public release of Paint.NET (v4.3.11) and do “various” things while holding down Ctrl + Alt + Shift + ` (tilde key), which forces a constant GC (full + LOH compaction). This method is much less reliable and can take awhile. I haven’t specifically tested this, either, and v4.3.11 is still on .NET 6.0.5, but I might dive into that anyway – if it doesn’t crash then it would point towards some change in .NET 6.0.6 or 6.0.7.

Expected behavior

No crashes, just slow performance due to use of gcstress

Actual behavior

Crash

Regression?

Unknown

Known Workarounds

None

Configuration

.NET 6.0.7 Windows 11 build 22621.232 (Beta? Release Preview?) x64 (Ryzen 5950X)

Other information

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 16 (16 by maintainers)

Most upvoted comments

Windows 11 build 22621.232 (Beta? Release Preview?)

Yes, this looks like a preview.

Windows previews had regression recently in GetThreadContext API that may explain intermittent crashes like this. The fix is propagating through the system. For reference, it is Windows OS issue 40313032.

I’m also using Win11 but cannot reproduce the crash with the blank winform/wpf project. However, I’m not using a preview version.

It’s also .NET 6.0.7, but Intel x64 CPU.