runtime: GC slow since .net 7 Preview 5/6/7 somewhere.
Description
I run cluster tests. This consists of a AMD Ryzen 9 3950X 16 core processor with 64Gb ram running a bunch of nodes sending messages to each other over localhost TCP. The application framework is optimized to induce as little GC pressure as possible. This is in fact the point of the framework and this exercise, to see what the GC does with many threads casing very little GC pressure but cumulatively at high rates.
When using perfview, the alloc that happens the most is on the class Task
. All the other allocs that cause GC pressure have been heapified by the framework.
Configuration
- Which version of .NET is the code running on? 7.0.100-preview.7.22354.3
- What OS version, and what distro if applicable? Windows 11
- What is the architecture (x64, x86, ARM, ARM64)? AMD Ryzen 9 3950X
- If relevant, what are the specs of the machine? Asus x570f with 64Gb Ram
Regression?
Ever since around preview 5/6/7 (it felt like it got progressively worse) my tests started experiencing problems. At first the application runs fine, as it is busy injecting nodes into the cluster. At around node 200 (out of 700) the cluster starts to bog down. The logs pump in 1 second intervals, then freezes for 1 second then pumps again for 1 second. It’s like the GC goes into overdrive and collects every 1 second pausing the application for a second. If I go into perfview it says that my application is 99% busy with GC. The only object being malloced in mass is System.Threading.Tasks.Task
. What is strange is that once it goes into this mode, these pause timings are almost exact 1 second pumps. If you leave the app to continue, although making progress super slow, eventually it goes out of this state and suddenly the app starts working as normal again, catching up on things that failed because there was simply no CPU given to userspace.
Reproduce
Clone and run https://github.com/tactical-drone/zero and execute zero.sync
after unsetting those settings defined below.
Analysis
After adding these settings to the GC, the problem goes away:
<ServerGarbageCollection>true</ServerGarbageCollection>
<ConcurrentGarbageCollection>true</ConcurrentGarbageCollection>
<GCHeapCount>16</GCHeapCount>
<GCCpuGroup>0</GCCpuGroup>
<GCNoAffinitize>false</GCNoAffinitize>
<GCLOHThreshold>120000</GCLOHThreshold>
<GCConserveMemory>7</GCConserveMemory>
I don’t expect any fixes, but if there were changes this is how it is affecting us right now. The reason why I run bleeding edge is because I am still in RnD mode and this work is earmarked for long term dominance. Therefor at this stage I test it against bleeding edge reporting any strange things that might come up.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 25 (10 by maintainers)
Wait. But the conclusion could be pretty hilarious. The GC is so good it hides my fundamental memory leak. The server mode is just unbelievable. Holy 🤫
@Maoni0 Thanks for the advice! I will definitely adjust my trace strategy. Had I used this server mode from the beginning I might not have developed
zero.core
which basically introduces memory management and teardown ticks (inside ofIoNanoprobe
) I developed in C++ over the years to .net making the GC a lender of last resort.The progressively getting slow part is then because I introduced those turbo bits from the framework causing an overall speedup that eventually saturates the GC workstation mode. But server mode laughs it off. Fascinating data.
thanks for running on the bleeding edge and helping us discover problems, @tactical-drone!
it’d be great if you could please clarify a couple of things -
what did you mean by “stock gc”? meaning you don’t have these set?
when you said “get slower”, presumably you meant with the “stock gc” you saw the speed at which your app ran got slower with (approximately) each preview build since preview 5 (I’m not sure what “at the end of preview 4” meant, does that mean you were building the runtime your time and you started observing the slowness when you built the runtime at the end of preview 4)? how much slower would you say you observed? and did you observe the same slowness or did it get progressively worse with each preview build?