aspnetcore: [linux-arm64] Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets crashes

Edit by @antonfirsov: this can be fixed by #50624, see the latest conversation.


Describe the bug

We have been running on Gravitron2 instances for years without any runtime errors. Since upgrading to Gravitron3, quite frequently (at least several times per day) our application runtime crashes with:

System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
   at System.Threading.ThreadPool.<>c.<.cctor>b__78_0(Object state)
   at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()

This seems similar to issues previously raised here, affecting also ARM: https://github.com/dotnet/runtime/issues/70486 https://github.com/dotnet/runtime/issues/84407

However it is stated as fixed, which does not seem to be the case.

To Reproduce

We were unable to produce sample code reproducing the error, but are facing the error in a mission-critical service every day.

Exceptions (if any)

System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
   at System.Threading.ThreadPool.<>c.<.cctor>b__78_0(Object state)
   at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()

Further technical details

.NET SDK: Version: 7.0.400 Commit: 73bf45718d

Runtime Environment: OS Name: debian OS Version: 11 OS Platform: Linux RID: debian.11-arm64 Base Path: /usr/share/dotnet/sdk/7.0.400/

Host: Version: 7.0.10 Architecture: arm64 Commit: a6dbb800a4

.NET SDKs installed: 7.0.400 [/usr/share/dotnet/sdk]

.NET runtimes installed: Microsoft.AspNetCore.App 7.0.10 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 7.0.10 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

Hosting: Docker via Hashicorp Nomad on EC2, specifically c7g.2xlarge IDE: Rider 2023.1.4

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 27 (16 by maintainers)

Most upvoted comments

The thread stack suggests that the origin of the exception is Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver, which derives from SocketAwaitableEventArgs, which is, according to the comments, a “slimmed down version” of the code that was touched in the PR https://github.com/dotnet/runtime/pull/84432.

This could be a Kestrel bug. @halter73 @davidfowl is it possible that _continuation should be made volatile here or that you need to adapt sg like https://github.com/dotnet/runtime/pull/82147?

https://github.com/dotnet/aspnetcore/blob/203730e1fdc17ae8b0f704e897b523b45df949fb/src/Servers/Kestrel/Transport.Sockets/src/Internal/SocketAwaitableEventArgs.cs#L21

Both Stresstest and Production were running the patched version as of friday, and there have been no crashes anymore. 👍

I also can’t see the dump, and going down the rabbit hole with it is a scenario I really hope to avoid 😃

I made a patched build of Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.dll: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.zip

@tommaybe please replace the file under <DOTNETDIR>/shared/Microsoft.AspNetCore.App/8.0.0-preview.7.23375.9/, run your app with .NET 8.0 preview 7 and let us know whether it eliminates the crash.

@karelz A 8.0 fix is fully sufficient for us.

Thanks for everyones help on getting this resolved.

@karelz The 8.0 fix has been merged: https://github.com/dotnet/aspnetcore/pull/50690

+1 on needing more info for further backports.

Thanks @tommaybe, great news! Since we have a proof that this is a bug in Kestrel socket transport, I transferred the issue to aspnetcore.

I am very much in favor of that. 👍 Let me know how to proceed.

@tommaybe would it be possible for you to test a custom build of Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.dll, preferably in a .NET 8.0 preview 7 setup?

@tommaybe can you send the dump to Anton.Firszov @microsoft.com? Are you sure all the ValueTask usages in your application are correct? Is there any other middleware between your app code and the runtime other than grpc-dotnet?

@stephentoub any troubleshooting ideas on this?