runtime: for loop little slower on .NET Core 3.0
The (very) basic benchmark below shows that a for loop on .NET Core 3.0 is a little slower than on .NET Core 2.1.5. What surprised me most, is that a foreach is “a lot” faster than a for loop on .NET Core 3.0.
using BenchmarkDotNet.Attributes;
using System.Linq;
namespace Benchmarks
{
public class LoopBenchmark
{
private int[] _ints;
private string[] _strings;
[Benchmark]
public void Int32_ForEach()
{
foreach (var s in _ints)
{
if (s == -1)
break;
}
}
[Benchmark]
public void Int32_For()
{
for (var i = 0; i < _ints.Length; i++)
{
if (_ints[i] == -1)
break;
}
}
[Benchmark]
public void String_ForEach()
{
foreach (var s in _strings)
{
if (s == null)
break;
}
}
[Benchmark]
public void String_For()
{
for (var i = 0; i < _strings.Length; i++)
{
if (_strings[i] == null)
break;
}
}
[GlobalSetup]
public void GlobalSetup()
{
_ints = Enumerable.Range(1, 100).ToArray();
_strings = _ints.Select(p => p.ToString()).ToArray();
}
}
}
.NET Core 3.0
BenchmarkDotNet=v0.11.2, OS=Windows 10.0.17134.345 (1803/April2018Update/Redstone4)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
Frequency=3914070 Hz, Resolution=255.4885 ns, Timer=TSC
.NET Core SDK=3.0.100-alpha1-009640
[Host] : .NET Core 3.0.0-preview1-27004-04 (CoreCLR 4.6.27003.04, CoreFX 4.6.27003.02), 64bit RyuJIT
DefaultJob : .NET Core 3.0.0-preview1-27004-04 (CoreCLR 4.6.27003.04, CoreFX 4.6.27003.02), 64bit RyuJIT
Method | Mean | Error | StdDev |
---|---|---|---|
Int32_ForEach | 40.81 ns | 0.2533 ns | 0.2369 ns |
Int32_For | 56.18 ns | 0.8982 ns | 0.8402 ns |
String_ForEach | 41.19 ns | 0.2446 ns | 0.2169 ns |
String_For | 56.52 ns | 0.3497 ns | 0.3271 ns |
.NET Core 2.1.5
BenchmarkDotNet=v0.11.2, OS=Windows 10.0.17134.345 (1803/April2018Update/Redstone4)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
Frequency=3914070 Hz, Resolution=255.4885 ns, Timer=TSC
.NET Core SDK=3.0.100-alpha1-009640
[Host] : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT
DefaultJob : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT
Method | Mean | Error | StdDev |
---|---|---|---|
Int32_ForEach | 52.81 ns | 0.3626 ns | 0.3392 ns |
Int32_For | 54.38 ns | 0.2734 ns | 0.2557 ns |
String_ForEach | 52.76 ns | 0.3920 ns | 0.3274 ns |
String_For | 54.35 ns | 0.3005 ns | 0.2811 ns |
I always (wrongly?) assumed that a for loop would produce far (?) more optimized code.
category:cq theme:loop-opt skill-level:intermediate cost:medium
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 1
- Comments: 15 (14 by maintainers)
Looks like in the faster
ForEach
case the first four instructions in the loop are all in the same 16 byte bundle, while in the slowerFor
case they split across two bundes (thecmp
hangs over just a bit). So loop alignment still seems like the best guess.If you have access to the hardware perf counters (say via vtune) you could confirm, you should see similar instructions retired but higher clocks (and hence lower CPI).
Alternatively you can try adding some padding by hand to see if you can influence results (either for better or worse) – say add an out param to the method and store to it above the loop (or store to it more than once, etc). If doing that drastically alters perf for either variant then the code is highly alignment senstitive.
@omariom Tiered JIT was indeed still enabled by default in the preview build I used. With Tiered JIT disabled, the generate code is identical.
With Tiered JIT disabled, foreach is way faster than for (including the change @benaadams suggested) in .NET Core 3.0:
.NET Core 2.1.5:
.NET Core 3.0 Preview1
As you can see, foreach has gotten quite a bit faster with .NET Core 3.0.
I think it would be caused by https://github.com/dotnet/coreclr/pull/15756?
/cc @mikedn