aspnetcore: [ARM][.NET 7] - System.ArgumentOutOfRangeException on ThreadPool ctor
Description
Hi there!
We’re running our system in a .NET 7 (more specifically, 7.0.12,) Linux env, and, when changing to use ARM instances, we noticed an unusual behavior where the instances started crashing.
We were unable to pinpoint a cause for that. It happens quite constantly. For a service running with 120 instances, we would have at least 5 instances unhealthy at a single point in time. For the time being, we didn’t find anything to trigger this crash. The exception we see in the logs is the following one:
Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
at System.Threading.ThreadPool.<>c.<.cctor>b__78_0(Object state)
at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
We’ve also noticed that this problem was reported in #84407 and solved in version 7.0.7, PR #84641. Nevertheless, we still noticed it happening when using ARM. When running with AMD, we didn’t notice the same behavior.
Do you have any idea about what may be causing that? Thanks!
Reproduction Steps
Running a .NET 7.0.12 Web API, using docker on top of AWS EC2 Linux image.
Expected behavior
We expect to see no crashes on the SocketAsyncEngine, as mentioned in the exception above
Actual behavior
Currently, running a service with ~ 120 instances, at a certain point in time, some instances start to crash with the following exception message
Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
at System.Threading.ThreadPool.<>c.<.cctor>b__78_0(Object state)
at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
Regression?
No response
Known Workarounds
No response
Configuration
We’re running a .NET 7.0.12 web API
We are using the docker image available at mcr.microsoft.com/dotnet/aspnet:7.0
Running on a Debian GNU/Linux 11 (bullseye)
Other information
No response
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Reactions: 3
- Comments: 23 (13 by maintainers)
I transferred the issue to ASP.NET for triage.
cc @amcasey
This is a backport request of https://github.com/dotnet/aspnetcore/pull/50624 to 7.0 asked for by several customers, see above.
We plan to ship these in the January release, which is the next one.
@isadoraq The PR is tagged 7.0.15, so I’m guessing it will be in the next one. @wtgodbe may have a more precise answer.
Hi folks, just adding here that we managed to run the test on our side as well and the results were pretty good. The problem seemed to be solved for us and we didn’t notice any other errors. Thanks again for ur support @antonfirsov
@pkubitsc The PR says 6.0.24, but I don’t know if that’s up to date. I’d say no later than 6.0.26, in any case.
Edit: the actual commit doesn’t have a release tag, so I’m guessing it missed 6.0.24 and the milestone is out of date. I’d expect it to be in the next 6.0 patch that goes out, whatever number that happens to have.
The 6.0 and 7.0 PRs have been merged. Thanks for driving, @antonfirsov, and validating, @tiaraju!
I have created a patched build on top of the
7.0.12state: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.zipYou need to replace
Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.dllin/usr/lib/dotnet/shared/Microsoft.AspNetCore.App/7.0.12(or whatever folder your runtime lives in) with the provided file.Let us know if this makes the issue go away!
We are observing this as well (several application crashes/day)
Our environment is various containerized workloads; the issue manifests on multiple patch versions of
dotnet, ranging from7.0.5through latest (7.0.12)Not able to run an RC, but could incorporate a patched assembly into our runtime container if that helps get the fix backported
Cool. We’ll run the tests and I’ll come back with the results as soon as we got them
@tiaraju yes.