aspnetcore: "async" fails as the number of threads increases
UPDATE 03/05/2021:
A few months ago we discovered that this was somehow tied to the use of the async keyword, and have since been able to prove it in the AcmeWebApi test application. As @davidfowl has stated in our other thread, this is most likely a race condition. Since it is virtually impossible to write any application that doesn’t use async in some way now (due to the core library changes). it is not possible to run tests that only use synchronous code. I can say that when we did have synchronous code, we were seeing a drastically higher performance metric than we currently are seeing.
Leaving the following, as that is what we started the thread with:
We have run into several issues with ASP.NET Core that appear to be threading related. I initially created #26955, as that was the first issue that we ran into, but creating an application that can be tested is till ongoing. In the process of creating an application for that purpose, we were able to replicate another issue, which is the topic of this thread. The application linked below replicates this issue under the following conditions:
- 3,500+ concurrent clients (we are required to exceed 8,000).
- Average throughput in total across all connections is 6,000 req/s (this is our minimum for maximum throughput).
- VM has 2 dedicated CPUs and 4GB RAM.
Under these conditions we observe numerous HeartbeatSlow issues across random connections (threads), which in our full application leads to complete system failure over time. We are working on providing ways to replicate the other issues that we have observed, but these are currently the only ones that we can replicate for you in a test application.
This issue ONLY exists on Linux (we used Ubuntu 20.04.1 LTS for verification) and results in both a significant reduction of throughput and a significantly higher latency. When running in our full application, this issue, along with others, causes a complete system failure (APPCRASH) as the process runs out of memory. No matter how much we try, this issue, and the others, cannot be replicated in a Windows environment (Windows 10 and Windows Server 2019 were tested).
SDK: 3.1.301 VS: 16.8.2
The test application is available in the private repo AcmeWebApi.
FULL DISCOLSURE I work for Webroot / Carbonite / OpenText and the application discussed above is the property of said entities. Microsoft is a direct / indirect customer of ours, so I am limited on the information that I am allowed to provide.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 52 (27 by maintainers)
Based on https://github.com/Kaelum/AcmeWebApi/blob/main/src/AcmeWebApi/Services/ApiService.cs#L329 and https://github.com/Kaelum/AcmeWebApi/blob/main/src/AcmeWebApi/Handlers/TcpHandler.cs#L415-L420
You are probably filling up the thread pool since you use 3,500 simultaneous connections that are all queued and block each thread.
I don’t think it shows a bug in aspnet. Do you want to explain what you are trying to achieve to get some advice on how to approach it differently?
If the application is running out of memory, collecting a memory dump might help. Once the application starts to struggle under load, try running
dotnet-dump collectand taking a look at it withdotnet-dump analyze.dumpheap -stat,clrstack -allanddumpasyncare all interesting commands to take a look at.https://docs.microsoft.com/en-us/dotnet/core/diagnostics/debug-memory-leak
@sebastienros identified blocking above:
You’re core processing logic processRequestBuffer - https://github.com/Kaelum/AcmeWebApi/blob/3593528fdb27e460ab332eec922ec292d61c28bc/src/AcmeWebApi/Handlers/TcpHandler.cs#L495
Is being dispatched to the thread pool, and then dispatches again and blocks:
You should just Task.Run and make
processRequestBufferan async method. That’s likely one of the reasons for poor performance WRT to threading. There are lots of other things that could be done to improve the code but those aren’t likely relevant to the discussion.@Kaelum “works perfectly” might be a stretch. There might be some minor differences causing the behavior you’re seeing that we’ve not aware of but it doesn’t change the fact that the code should stop blocking threads.
@Kaelum you should read my questions before you answer them
I am French, I can answer this way too