runtime: for loop little slower on .NET Core 3.0

The (very) basic benchmark below shows that a for loop on .NET Core 3.0 is a little slower than on .NET Core 2.1.5. What surprised me most, is that a foreach is “a lot” faster than a for loop on .NET Core 3.0.


using BenchmarkDotNet.Attributes;
using System.Linq;

namespace Benchmarks
{
    public class LoopBenchmark
    {
        private int[] _ints;
        private string[] _strings;

        [Benchmark]
        public void Int32_ForEach()
        {
            foreach (var s in _ints)
            {
                if (s == -1)
                    break;
            }
        }

        [Benchmark]
        public void Int32_For()
        {
            for (var i = 0; i < _ints.Length; i++)
            {
                if (_ints[i] == -1)
                    break;
            }
        }

        [Benchmark]
        public void String_ForEach()
        {
            foreach (var s in _strings)
            {
                if (s == null)
                    break;
            }
        }

        [Benchmark]
        public void String_For()
        {
            for (var i = 0; i < _strings.Length; i++)
            {
                if (_strings[i] == null)
                    break;
            }
        }

        [GlobalSetup]
        public void GlobalSetup()
        {
            _ints = Enumerable.Range(1, 100).ToArray();
            _strings = _ints.Select(p => p.ToString()).ToArray();
        }
    }
}

.NET Core 3.0


BenchmarkDotNet=v0.11.2, OS=Windows 10.0.17134.345 (1803/April2018Update/Redstone4)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
Frequency=3914070 Hz, Resolution=255.4885 ns, Timer=TSC
.NET Core SDK=3.0.100-alpha1-009640
  [Host]     : .NET Core 3.0.0-preview1-27004-04 (CoreCLR 4.6.27003.04, CoreFX 4.6.27003.02), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0-preview1-27004-04 (CoreCLR 4.6.27003.04, CoreFX 4.6.27003.02), 64bit RyuJIT


Method Mean Error StdDev
Int32_ForEach 40.81 ns 0.2533 ns 0.2369 ns
Int32_For 56.18 ns 0.8982 ns 0.8402 ns
String_ForEach 41.19 ns 0.2446 ns 0.2169 ns
String_For 56.52 ns 0.3497 ns 0.3271 ns

.NET Core 2.1.5


BenchmarkDotNet=v0.11.2, OS=Windows 10.0.17134.345 (1803/April2018Update/Redstone4)
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
Frequency=3914070 Hz, Resolution=255.4885 ns, Timer=TSC
.NET Core SDK=3.0.100-alpha1-009640
  [Host]     : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.5 (CoreCLR 4.6.26919.02, CoreFX 4.6.26919.02), 64bit RyuJIT


Method Mean Error StdDev
Int32_ForEach 52.81 ns 0.3626 ns 0.3392 ns
Int32_For 54.38 ns 0.2734 ns 0.2557 ns
String_ForEach 52.76 ns 0.3920 ns 0.3274 ns
String_For 54.35 ns 0.3005 ns 0.2811 ns

I always (wrongly?) assumed that a for loop would produce far (?) more optimized code.

category:cq theme:loop-opt skill-level:intermediate cost:medium

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 1
  • Comments: 15 (14 by maintainers)

Most upvoted comments

Looks like in the faster ForEach case the first four instructions in the loop are all in the same 16 byte bundle, while in the slower For case they split across two bundes (the cmp hangs over just a bit). So loop alignment still seems like the best guess.

If you have access to the hardware perf counters (say via vtune) you could confirm, you should see similar instructions retired but higher clocks (and hence lower CPI).

Alternatively you can try adding some padding by hand to see if you can influence results (either for better or worse) – say add an out param to the method and store to it above the loop (or store to it more than once, etc). If doing that drastically alters perf for either variant then the code is highly alignment senstitive.

@omariom Tiered JIT was indeed still enabled by default in the preview build I used. With Tiered JIT disabled, the generate code is identical.

With Tiered JIT disabled, foreach is way faster than for (including the change @benaadams suggested) in .NET Core 3.0:

.NET Core 2.1.5:

Method Mean Error StdDev
Int32_ForEach 51.96 ns 0.0162 ns 0.0136 ns
Int32_For 52.21 ns 0.3361 ns 0.2979 ns

.NET Core 3.0 Preview1

Method Mean Error StdDev
Int32_ForEach 36.86 ns 0.0186 ns 0.0174 ns
Int32_For 53.79 ns 0.0984 ns 0.0872 ns

As you can see, foreach has gotten quite a bit faster with .NET Core 3.0.

I think it would be caused by https://github.com/dotnet/coreclr/pull/15756?

If we are fetching an Array Length for an array ref that came from global memory then for CSE safety we must use the conservative value number for both

/cc @mikedn