runtime: [linux-arm64] Random and rare runtime crash System.ArgumentOutOfRangeException (System.Net.Sockets)
Description
Description Random and rare crashes with this exception:
Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.InvokeContinuation(Action`1 continuation, Object state, Boolean forceAsync, Boolean requiresExecutionContextFlow)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.OnCompleted(SocketAsyncEventArgs _)
at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
It seems to append only on loaded applications.
Exit signal: Abort (6)
Reproduction Steps
We don’t have any reproduction yet. We probably need to heavily stress network! It seems to be a race condition.
Expected behavior
don’t crash the runtime when we are using sockets…
Actual behavior
random and rare crashes of the runtime
Regression?
No response
Known Workarounds
No response
Configuration
Dotnet runtime version: 6.0.6 OS : GNU/Linux Debian 11 Bullseye CPU: ARM64 Graviton 2 (AWS) We are using Orleans with this application
Other information
follow up of https://github.com/dotnet/runtime/issues/70486 we triple checked all usages of ValueTask and removed all usages of it, just to be sure this time, this is notn some ValueTasks awaited twice
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 19 (13 by maintainers)
I’m also getting the same error. It looks like it’s also appearing here: https://github.com/aws/aws-lambda-dotnet/issues/1244.
My config: Dotnet runtime version: v7.0.100-preview.7 (ARM64) OS : macOS 12.5 (Monterey) CPU: ARM64 Apple Silicon M1 Max
This happens (again, intermittently) when I’m running/debugging a few microservices (on Kestrel - was unsure if this was a Kestrel issue, but saw this reported here).
One additional piece of info is that in the framework method that throws the exception:
The argument parameter is always (int) 40 .
Just like the linked AWS issue, none of my exception handlers seem to be catching the error.
Any ideas on next steps? I can’t seem to isolate the exception for repo…
Unfortunately this limitation comes about from our support of on-premises machines; these tend to cost us lots of time and money uploading large dumps which are often ignored despite this time/financial cost.
If you need to check out a machine with the same specifications as the test one, that can likely be arranged, we’d just need to know the specific queue that this work item ran on (or have its full log linked, etc)