runtime: .NET Core - code is very slow when run on a computer with many cores
Here’s the code:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
BenchmarkRunner.Run(typeof(Test));
public class Test
{
[Benchmark]
public Task Test_1() => TestCode(1);
[Benchmark]
public Task Test_Many() => TestCode(80);
private static Task TestCode(int degreeOfParallelism)
=> ParallelForEach(Enumerable.Range(1, degreeOfParallelism), degreeOfParallelism, _ => MethodRunInParallel());
private static Task MethodRunInParallel()
{
var set = new HashSet<int>();
for (var i = 0; i < 100000; i++)
set.Add(i);
return Task.CompletedTask;
}
private static Task ParallelForEach<T>(IEnumerable<T> source, int degreeOfParallelism, Func<T, Task> asyncAction)
{
var tasks = Partitioner
.Create(source)
.GetPartitions(degreeOfParallelism)
.Select(partition => Task.Run(async () =>
{
using (partition)
while (partition.MoveNext())
await asyncAction(partition.Current);
}));
return Task.WhenAll(tasks);
}
}
lscpu:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 160
On-line CPU(s) list: 0-159
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 8
NUMA node(s): 8
Vendor ID: GenuineIntel
CPU family: 6
Model: 47
Model name: Intel(R) Xeon(R) CPU E7- 2860 @ 2.27GHz
Stepping: 2
CPU MHz: 1064.000
CPU max MHz: 2266,0000
CPU min MHz: 1064,0000
BogoMIPS: 4521.94
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 24576K
NUMA node0 CPU(s): 0-9,80-89
NUMA node1 CPU(s): 10-19,90-99
NUMA node2 CPU(s): 20-29,100-109
NUMA node3 CPU(s): 30-39,110-119
NUMA node4 CPU(s): 40-49,120-129
NUMA node5 CPU(s): 50-59,130-139
NUMA node6 CPU(s): 60-69,140-149
NUMA node7 CPU(s): 70-79,150-159
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb kaiser tpr_shadow vnmi flexpriority ept vpid dtherm ida arat
dotnet --info:
.NET SDK (reflecting any global.json):
Version: 5.0.300
Commit: 2e0c8c940e
Runtime Environment:
OS Name: debian
OS Version: 9
OS Platform: Linux
RID: debian.9-x64
Base Path: /usr/share/dotnet/sdk/5.0.300/
Host (useful for support):
Version: 5.0.6
Commit: 478b2f8c0e
.NET SDKs installed:
5.0.300 [/usr/share/dotnet/sdk]
.NET runtimes installed:
Microsoft.AspNetCore.App 5.0.6 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 5.0.6 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Result of taskset -c 0-9 dotnet run -c Release
Result of dotnet run -c Release.
Using all physical cores seems to slow down the execution 53-times. I would naturally expect some overhead, but not this high.
Is there a way to set up my .NET Core application so it uses multiple cores better?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 20 (16 by maintainers)
You’re creating a hash set for each thread, and reallocating the backing array multiple times as the size increases, so at minimum the multi-thread version isn’t using the same amount of memory. The minimum amount of memory you’re using is ~30GB, without counting reallocations. A naive thing to check would be if you were experiencing GC stalls. If you pre-size the
HashSet, does the situation improve?Too, there’s only so many threads created by the runtime by default. What is
ThreadPool.ThreadCountbefore execution? And what doesThreadPool.GetMaxThreads()/ThreadPool.GetMinThreads()return?