runtime: .NET 6 Managed `ThreadPool` / `Thread` performance and reliability degradations
Description
Since upgrading Akka.NET’s test suite to run on .NET 6, we’ve noticed two major classes of problems that occur on .NET 6 only:
- Predictable, sizeable drops in total throughput in busy systems such as our remoting benchmarks documented here https://github.com/akkadotnet/akka.net/issues/5385
Thread
behavior that causes tools such as ourHashedWheelTimerScheduler
to no longer be reliable - i.e. https://github.com/akkadotnet/akka.net/blob/60807febbc63c249a113f4768ad597bcb55bf56e/src/core/Akka.Tests/Actor/Scheduler/TaskBasedScheduler_TellScheduler_Cancellation_Tests.cs#L128-L150 fails, which is odd given that that code runs on its own dedicated thread. It looks as though theThread
is totally unable to start or get scheduled within a reasonable period of time (i.e. 3 seconds). We run our test suite sequentially so we don’t have load problems that could cause this.
We run the exact same tests and benchmarks on .NET Core 3.1 and .NET Framework 4.7.1 and neither of those platforms exhibit these symptoms (large performance drops, long delays in execution)
Configuration
Machine that was used to generate our benchmark figures:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19041.1348 (2004/May2020Update/20H1)
AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.100
Machine that generates our test failures: windows-2019
shared agent on Azure DevOps.
Regression?
.NET Core 3.1 numbers:
.NET 6 numbers:
We’ve observed similar behavior in some of our non-public repositories as well.
Relevant benchmark:
https://github.com/akkadotnet/akka.net/tree/dev/src/benchmark/RemotePingPong
Managed ThreadPool
-only benchmark (most relevant for this issue) :
https://github.com/akkadotnet/akka.net/pull/5386
Analysis
We’re performing some testing now using the COMPlus_ThreadPool_UsePortableThreadPool
environment variable to see how it affects our benchmark and testing behavior (see https://github.com/akkadotnet/akka.net/pull/5441), but we believe that these two issues may be related to the problems we’ve been observing on .NET 6:
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (9 by maintainers)
@mangod9 @kouvel 100% understood. We’ve been trying to research this problem on our end before we brought it to the CoreCLR team’s attention since we know your time is valuable, scarce, and spread across many projects. We will continue to do that and try to provide helpful information to you and your team.
@Aaronontheweb it seems this was resolved by https://github.com/dotnet/runtime/pull/68881 according to https://github.com/akkadotnet/akka.net/issues/5385#issuecomment-1311795534. Can this issue be closed, or is it tracking something else?