aspnetcore: [linux-arm64] Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets crashes
Edit by @antonfirsov: this can be fixed by #50624, see the latest conversation.
Describe the bug
We have been running on Gravitron2 instances for years without any runtime errors. Since upgrading to Gravitron3, quite frequently (at least several times per day) our application runtime crashes with:
System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
at System.Threading.ThreadPool.<>c.<.cctor>b__78_0(Object state)
at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
This seems similar to issues previously raised here, affecting also ARM: https://github.com/dotnet/runtime/issues/70486 https://github.com/dotnet/runtime/issues/84407
However it is stated as fixed, which does not seem to be the case.
To Reproduce
We were unable to produce sample code reproducing the error, but are facing the error in a mission-critical service every day.
Exceptions (if any)
System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
at System.Threading.ThreadPool.<>c.<.cctor>b__78_0(Object state)
at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
Further technical details
.NET SDK: Version: 7.0.400 Commit: 73bf45718d
Runtime Environment: OS Name: debian OS Version: 11 OS Platform: Linux RID: debian.11-arm64 Base Path: /usr/share/dotnet/sdk/7.0.400/
Host: Version: 7.0.10 Architecture: arm64 Commit: a6dbb800a4
.NET SDKs installed: 7.0.400 [/usr/share/dotnet/sdk]
.NET runtimes installed: Microsoft.AspNetCore.App 7.0.10 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 7.0.10 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Hosting: Docker via Hashicorp Nomad on EC2, specifically c7g.2xlarge IDE: Rider 2023.1.4
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 27 (16 by maintainers)
This could be a Kestrel bug. @halter73 @davidfowl is it possible that
_continuationshould be madevolatilehere or that you need to adapt sg like https://github.com/dotnet/runtime/pull/82147?https://github.com/dotnet/aspnetcore/blob/203730e1fdc17ae8b0f704e897b523b45df949fb/src/Servers/Kestrel/Transport.Sockets/src/Internal/SocketAwaitableEventArgs.cs#L21
Both Stresstest and Production were running the patched version as of friday, and there have been no crashes anymore. 👍
I also can’t see the dump, and going down the rabbit hole with it is a scenario I really hope to avoid 😃
I made a patched build of
Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.dll: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.zip@tommaybe please replace the file under
<DOTNETDIR>/shared/Microsoft.AspNetCore.App/8.0.0-preview.7.23375.9/, run your app with .NET 8.0 preview 7 and let us know whether it eliminates the crash.@karelz A 8.0 fix is fully sufficient for us.
Thanks for everyones help on getting this resolved.
@karelz The 8.0 fix has been merged: https://github.com/dotnet/aspnetcore/pull/50690
+1 on needing more info for further backports.
Thanks @tommaybe, great news! Since we have a proof that this is a bug in Kestrel socket transport, I transferred the issue to aspnetcore.
I am very much in favor of that. 👍 Let me know how to proceed.
@tommaybe would it be possible for you to test a custom build of
Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.dll, preferably in a .NET 8.0 preview 7 setup?@tommaybe can you send the dump to
Anton.Firszov@microsoft.com? Are you sure all theValueTaskusages in your application are correct? Is there any other middleware between your app code and the runtime other than grpc-dotnet?@stephentoub any troubleshooting ideas on this?