runtime: System.Runtime.InteropServices.ExternalException : A generic error occurred in GDI+
-
.NET Core Version: Microsoft.WindowsDesktop.App 5.0.9
-
Have you experienced this same bug with .NET Framework?: No
Problem description: I switched my WinForms App from .NET 4.71 to .NET 5 and get the following crashes from my user’s machine:
System.Runtime.InteropServices.ExternalException: A generic error occurred in GDI+.
Source: System.Drawing.Common
-- get_ErrorCode = -2147467259 (0x80004005) [int]
-- get_HResult = -2147467259 (0x80004005) [int]
at [System.Drawing.Common] System.Drawing.Graphics void CheckErrorStatus(int )
at [System.Drawing.Common] System.Drawing.Graphics void FillRectangle(System.Drawing.Brush ,System.Drawing.Rectangle )
From the stack trace, I can see that this crash almost always originates from showing a messagebox (drawing the background). Since I’m not able to reproduce the crash myself, I reached out to my users and I was at least able to find a common pattern:
- Many users told me that the crash happens when the machine has been in sleep mode and wakes up. My app would show a message box when a network connection is interrupted, so this makes sense.
- Some users reported that the crash happens when Windows tries to shutdown/reboot while my app is running.
- Some users reported that the crash happens when my app is set to auto-run and shows the first dialog (password prompt). In this case it seems that the app has been started while the login screen is still shown.
Judging from the user reports, I guess this crash mostly happens when the “desktop” is “hidden” or not shown (like lock screen when the machine enters sleep mode, login screen, the screen shown when you shutdown/reboot Windows).
I’m not sure how to further troubleshoot the issue or resolve the issue.
Minimal repro: Unfortunately I’m not able to reliably reproduce the issue myself.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 41 (23 by maintainers)
I actually don’t get to make such calls, we can take it to servicing tactics and ask though. That was my intent for leaving this open.
First things first we need to decide what we do for 6.0. For that I’m still waiting for to hear back from @dotnet/dotnet-winforms / @JeremyKuhne around the questions here: https://github.com/dotnet/runtime/issues/58741#issuecomment-917176972. I put up a PR with what I believe is a minimal fix if we want to bring this back: https://github.com/dotnet/runtime/pull/59096
My best guess is that it just got missed when folks converted to dynamic loading, then didn’t get brought back when dynamic loading was removed in favor of SetDllImportResolver. Perhaps the GetProcAddress behavior in pre-.NET 5 was less susceptible to runtime interruption that clobbered GetLastError.
I think @JeremyKuhne called out why it might not be right to call SetLastError (and for that matter GetLastError as well) since GDI+ doesn’t document that they actually set it, you could be reading an error from some indirect call. Imagine GDI+ does something as part of it’s normal execution, say it checks for the existence of a registry key, the registry key doesn’t exist, which GDI+ sees and deems acceptable, but then fails for some other totally unrelated reason. The error in the last-error slot would be the Win32 error from the registry API which isn’t relevant. If an API doesn’t document that it sets last error, then you are effectively depending on undocumented side-effects of that API.
Judging from the System.Drawing code and comments on CheckErrorStatus, that seems to be exactly what System.Drawing is trying to do: draw some conclusions about why GDI+ returned a particular error, when it’s own status wasn’t granular enough. A better solution here would be to ask GDI+ to return a specific status that means it failed because https://github.com/dotnet/runtime/blob/208e377a5329ad6eb1db5e5fb9d4590fa50beadd/src/libraries/System.Drawing.Common/src/System/Drawing/Graphics.cs#L2410 Judging on the specificity of that comment, I can see why they might be reluctant to do that…
P/Invokng to GetLastError doesn’t help. The purpose of putting SetLastError on a PInvoke is to tell the runtime to store the value of the last error immediately upon return from the PInvoke. Otherwise the runtime might do somethings itself (EG: JIT does something that sets last error, GC, etc) that clears the value before executing the library code that reads it. At that point, calling Win32 GLE won’t give you the right result. I believe the only way to get the right result is to have interop store it (@AaronRobinsonMSFT could confirm).
Adding back SetLastError would restore the .NETFramework behavior, but might cost some in perf. @JeremyKuhne since you made changes in this area for WinForms perf and WinForms team owns this codebase on .NETFramework, can you make a reccomendation? I see 2 options: replace the usage of GetLastError with something that accomplishes the goal without relying on undocumented behavior (perhaps just checking SM_REMOTESESSION), or adding back SetLastError everywhere.
I’m really not an expert here but I don’t understand why the SetLastError flags have been removed in 2019. It seems to be the root cause for those crashes as (at least from what I can tell) there’s no other difference between the .NET Framework 4.71 implementation and the .NET Core one. As a non-expert (without knowing the reason for the SetLastError removal) this seems to be a regression which is causing real issues in production apps. I get A LOT of these crash reports and my users are quite annoyed by this “bug”.
I know it’s hard since there’s no reliable repro case but I have one user who can reproduce the issue most of the time (9 out of 10 times). If I could get my patched DLL to work in my production code, I could at least report back that this would fix the issue.
Quite frankly, unless there’s not a reasonable explanation why the SetLastError has been removed, it should be added again to be on par with the .NET Framework implementation.
Can anyone here provide the reason why they have been removed?
@StefanKoell I’m afraid I don’t know enough about the runtime, area owners should be able to help. (Expect some delays due public holidays in the US). I think @safern owns this library but he’s OOF. @ericstj is there someone else who can help?
@kirsan31 this is a perfect Q for @AaronRobinsonMSFT 😃
I’ll leave that decision to @ericstj.
It seems like it’s viable to put back SetLastError on some or all of the methods (we’d need some help from @JeremyKuhne if you’d want to subset them). I think I can buy this as a safe fix that meets the bar (fixing a regression between 3.x ->5.0)
Do we think it was ever the right thing to do to swallow these errors from GDI+? Don’t they represent real failures of the underlying API? All the call stacks above look to me like they would have side-effects, so an error from the API means it didn’t do the job we asked it to. Is it feasible to tell callers they should be catching these instead? Basically remove the special case. I know we’ve reached here accidentally, but is it a better place? (despite being different than .NETFramework / .NETCore 3.1) I guess this depends on how “lossy” other drawing API is, my impression is that it shouldn’t be…
Sorry, I’ve been out for the holiday weekend.
Note that GDI+ does not document that it sets last error. Looking through the sources it could, theoretically, set last error when it gets an HRESULT from some APIs that return an HRESULT with the Win32 facility code. That shows up in the bitmap and filter code. Outside of that, the underlying Win32 APIs that get called certainly can and do set last error.
The simplest thing here would be to plaster
SetLastError=true
on the P/Invokes (ugh). If we’re talking minimum change (for 6) we could just P/Invoke directly toGetLastError
here. The risk should be minimal for P/Invoking directly toGetLastError
.@maryamariyan @michaelgsharp @safern @tarekgh
That seems like the issue here.
Right. If error codes are expected from a P/Invoke, the
SetLastError
should be set totrue
.It appears the change here was done quite some time ago and to a different mechanism: https://github.com/dotnet/corefx/pull/39275.
/cc @elinor-fung @JeremyKuhne
Nice to see that I’m not alone!
I don’t have any of these reports from Windows 7 machines. This could mean that it doesn’t exist on Windows 7 or my user base doesn’t use Windows 7 anymore.
As for deeper stack traces, here’s one:
I’m also using the DevExpress UI libraries but I also have seen stacktraces like yours, pure WinForms. Since the FillRectangle method seems to be a common entry point for most stack traces I get, I guess it has something to do with the GDI graphics context when Windows shows a non interactive desktop, such as the lock, login or shutdown screen.
I hope someone can shed some light on the error and on how to further troubleshoot the issue. If I know which data I should collect, I could try to attach this to my crash reports for further investigation.
We made this fix to 6.0 and 7.0. Will leave this open to port for servicing.
On the same thread?
More of a curiosity. It would be good to get to a place where we have the interop code in sources rather than generating it at runtime- wondering if we can leverage that for this part of things.
Any number of issues can occur, but particularly a GC – which can occur anywhere in the IL stub.
I guess, but why not simply add it back? What is the concern here? P/Invokes are much better than the delegate approach – at least from a maintenance perspective.
@RussKie If I can get a hand on a special build with this attribute, I could roll that out to my users in order to find out if it resolves the issue. Unfortunately I’m not able to provide a stand alone repro. I’m not sure what other condition must be met to get the crash. I haven’t been able to reproduce the crash on my machines.
Good catch! I hope that someone at MS (@OliaG) can take a look at it.