aspnetcore: [ARM][.NET 7] - System.ArgumentOutOfRangeException on ThreadPool ctor

Description

Hi there!

We’re running our system in a .NET 7 (more specifically, 7.0.12,) Linux env, and, when changing to use ARM instances, we noticed an unusual behavior where the instances started crashing.

We were unable to pinpoint a cause for that. It happens quite constantly. For a service running with 120 instances, we would have at least 5 instances unhealthy at a single point in time. For the time being, we didn’t find anything to trigger this crash. The exception we see in the logs is the following one:

Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
   at System.Threading.ThreadPool.<>c.<.cctor>b__78_0(Object state)
   at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()

We’ve also noticed that this problem was reported in #84407 and solved in version 7.0.7, PR #84641. Nevertheless, we still noticed it happening when using ARM. When running with AMD, we didn’t notice the same behavior.

Do you have any idea about what may be causing that? Thanks!

Reproduction Steps

Running a .NET 7.0.12 Web API, using docker on top of AWS EC2 Linux image.

Expected behavior

We expect to see no crashes on the SocketAsyncEngine, as mentioned in the exception above

Actual behavior

Currently, running a service with ~ 120 instances, at a certain point in time, some instances start to crash with the following exception message

Unhandled exception. System.ArgumentOutOfRangeException: Specified argument was out of the range of valid values. (Parameter 'state')
   at System.Threading.ThreadPool.<>c.<.cctor>b__78_0(Object state)
   at System.Net.Sockets.SocketAsyncEngine.System.Threading.IThreadPoolWorkItem.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()

Regression?

No response

Known Workarounds

No response

Configuration

We’re running a .NET 7.0.12 web API We are using the docker image available at mcr.microsoft.com/dotnet/aspnet:7.0 Running on a Debian GNU/Linux 11 (bullseye)

Other information

No response

About this issue

  • Original URL
  • State: closed
  • Created 9 months ago
  • Reactions: 3
  • Comments: 23 (13 by maintainers)

Most upvoted comments

I transferred the issue to ASP.NET for triage.

cc @amcasey

This is a backport request of https://github.com/dotnet/aspnetcore/pull/50624 to 7.0 asked for by several customers, see above.

We plan to ship these in the January release, which is the next one.

@isadoraq The PR is tagged 7.0.15, so I’m guessing it will be in the next one. @wtgodbe may have a more precise answer.

Hi folks, just adding here that we managed to run the test on our side as well and the results were pretty good. The problem seemed to be solved for us and we didn’t notice any other errors. Thanks again for ur support @antonfirsov

@pkubitsc The PR says 6.0.24, but I don’t know if that’s up to date. I’d say no later than 6.0.26, in any case.

Edit: the actual commit doesn’t have a release tag, so I’m guessing it missed 6.0.24 and the milestone is out of date. I’d expect it to be in the next 6.0 patch that goes out, whatever number that happens to have.

The 6.0 and 7.0 PRs have been merged. Thanks for driving, @antonfirsov, and validating, @tiaraju!

I have created a patched build on top of the 7.0.12 state: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.zip

You need to replace Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.dll in /usr/lib/dotnet/shared/Microsoft.AspNetCore.App/7.0.12 (or whatever folder your runtime lives in) with the provided file.

Let us know if this makes the issue go away!

We are observing this as well (several application crashes/day)

Our environment is various containerized workloads; the issue manifests on multiple patch versions of dotnet, ranging from 7.0.5 through latest (7.0.12)

Not able to run an RC, but could incorporate a patched assembly into our runtime container if that helps get the fix backported

@tiaraju yes.

Cool. We’ll run the tests and I’ll come back with the results as soon as we got them