runtime: Performance Regression with .NET 8 Preview 6

Description

I recently was playing around with FrozenCollections for my little sideproject IndexedSet to speed up the dictionary lookups that frequently occur. While I succeeded to do that, I noticed that some areas are significantly slower with .NET 8 Preview 6. While most of the benchmarks are, as expected, either within margin of errors or faster, the trie-based lookups are slower:

Context: IndexedSet provides an easy way to create in-memory indices on a collection without having to maintain multiple collections / dictionaries. The Prefix / Fulltext Indices are associative tries: Each node holds a dictionary (System.Collection.Generics.Dictionary<char, TrieNode>, default comparer) associating the next char with its child nodes as well as elements. Lookup then simply recursivly traverses the trie based on each character within the given key.

Benchmark results:

BenchmarkDotNet v0.13.7, Windows 11 (10.0.22621.2070/22H2/2022Update/SunValley2)
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK 8.0.100-preview.6.23330.14
  [Host]   : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  .NET 6.0 : .NET 6.0.20 (6.0.2023.32017), X64 RyuJIT AVX2
  .NET 7.0 : .NET 7.0.9 (7.0.923.32018), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.0 (8.0.23.32907), X64 RyuJIT AVX2
Method Job Runtime Mean Error StdDev Ratio Gen0 Allocated Alloc Ratio
StartsWith_Linq .NET 6.0 .NET 6.0 3,862.3 ns 50.22 ns 46.97 ns 1.00 0.0076 128 B 1.00
StartsWith_IndexedSet .NET 6.0 .NET 6.0 612.4 ns 1.03 ns 0.86 ns 0.16 0.0086 144 B 1.12
StartsWith_Linq .NET 7.0 .NET 7.0 2,675.9 ns 35.73 ns 31.67 ns 1.00 0.0076 128 B 1.00
StartsWith_IndexedSet .NET 7.0 .NET 7.0 596.0 ns 2.65 ns 2.35 ns 0.22 0.0086 144 B 1.12
StartsWith_Linq .NET 8.0 .NET 8.0 2,239.4 ns 21.98 ns 18.36 ns 1.00 0.0076 128 B 1.00
StartsWith_IndexedSet .NET 8.0 .NET 8.0 770.4 ns 2.23 ns 1.98 ns 0.34 0.0086 144 B 1.12

I created a branch for this, net8-preview6-perf. Running the benchmarks is simple: Clone the repo, switch to the branch, build and run the benchmarks PrefixIndexBenchmarks (and FullTextIndexBenchmarks). For the further analysis, I started with looking into PrefixIndexBenchmarks.

Analysis

Profiling shows different top functions: With 8.0: image

With 7.0: image

However, the profiler shows me, that most time is spent on the null check (which I doubt): image

Looking at the jitted code using (DissassemblerDiagnoser), it seems that .NET 7 had a whole bunch of methods inlined, in fact, all the methods from entire Trie seem to be inlined if I read the asm correctly. Hoever, using the InlineDiagnoser, I could, even though there are differences, not confirm my suspicion.

So here I am now, not sure how to continue and a bit confused why GetAll and AddRecursivlyToResult show as failed inlining in both .NET 7 & 8 but I can only find the methods in the Assembly code within .NET 8… I’d be happy to learn why that is the case.

Is this regression expected?

  • If yes => what is the reason & what do you recommend as mitigation? Trying to specify the behavior (i.e. MethodImpl)? (I have some other low hanging fruits, but could well be that they are eaten up by the regression).
  • If not, I’m happy to do further investigation, but would most likely need some help with that.

And because it usually is done too rarely: Keep up the good work, I really enjoy .NET!

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (14 by maintainers)

Most upvoted comments

Would that be in https://github.com/dotnet/performance/tree/main/src/benchmarks/real-world, similar to for example ImageSharp? Would you suggest that I do something with the results in the benchmarks or leave them like they are? We use the package extensivly in one of our projects and I could create less synthetic benchmarks based on that…

Yes, that’s the right spot. A less synthetic benchmark would be preferable, but if it’s a lot of work then consider adding what you already have.

On my main dev machine I see 8 preview 7 as faster…

BenchmarkDotNet v0.13.7, Windows 11 (10.0.22621.2134/22H2/2022Update/SunValley2) Intel Core i7-8700 CPU 3.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores .NET SDK 8.0.100-preview.7.23376.3 [Host] : .NET 7.0.10 (7.0.1023.36312), X64 RyuJIT AVX2 .NET 7.0 : .NET 7.0.10 (7.0.1023.36312), X64 RyuJIT AVX2 .NET 8.0 : .NET 8.0.0 (8.0.23.37506), X64 RyuJIT AVX2

Method Job Runtime Mean Error StdDev Gen0 Code Size Allocated
StartsWith_IndexedSet .NET 7.0 .NET 7.0 858.9 ns 16.55 ns 15.48 ns 0.0229 3,885 B 144 B
StartsWith_IndexedSet .NET 8.0 .NET 8.0 806.8 ns 13.41 ns 11.89 ns 0.0229 3,658 B 144 B

(note this is an intel box, not an AMD)

Let me drill in a bit, and perhaps try this on some of my other machines.