roslyn: Major performance regression of the array initialization pattern
In https://github.com/dotnet/performance/pull/2534 I’ve bumped BDN version to see if @MichalStrehovsky fix (https://github.com/dotnet/BenchmarkDotNet/pull/2046) helped. It did, but new issue popped out.
86 out of 4334 benchmarks that were working fine in the past (2 months ago was the last time I run them), are now failing with PlatformNotSupportedException
:
System.PlatformNotSupportedException: Operation is not supported on this platform.
at Internal.Runtime.CompilerHelpers.ThrowHelpers.ThrowPlatformNotSupportedException() + 0x24
at System.Net.Primitives.Tests.IPAddressPerformanceTests.<ByteAddresses>d__0.MoveNext() + 0x45
at System.Linq.Enumerable.<OfTypeIterator>d__65`1.MoveNext() + 0x6b
at System.Linq.Enumerable.MaxInteger[TSource, TResult](IEnumerable`1, Func`2) + 0x48
at System.Net.Primitives.Tests.IPAddressPerformanceTests..ctor() + 0x9d
at BenchmarkDotNet.Autogenerated.Runnable_1..ctor() + 0x2a
at BenchmarkDotNet.Autogenerated.Runnable_1.Run(IHost, String) + 0x6e
at BenchmarkDotNet.Autogenerated.UniqueProgramName.AfterAssemblyLoadingAttached(String[]) + 0x1a2
for code that seems to be very simple:
I have moved this code to a standalone app and was not able to repro the issue.
Repro steps:
git clone https://github.com/dotnet/performance.git
cd performance
py .\scripts\benchmarks_ci.py -f nativeaot7.0 --filter System.Net.Primitives.Tests.IPAddressPerformanceTests.Ctor_Span --bdn-arguments "--keepFiles true"
BDN reports the path where to project is stored (the last path in the line below):
// start dotnet build -c Release -r win-x64 --no-restore /p:UseSharedCompilation=false /p:BuildInParallel=false /m:1 /p:Deterministic=true /p:Optimize=true /p:NuGetPackageRoot="D:\projects\performance\artifacts\packages" in D:\projects\performance\artifacts\bin\MicroBenchmarks\Release\net7.0\Job-NYBBDR
cc @jkotas
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (15 by maintainers)
Part of this is the existing overhead. For large allocations, the costs can be fairly visible and there are cases where those are frequent enough or where they are happening in contexts where the costs do matter.
This is no more a trap than
NativeMemory.Alloc
and whileNativeMemory.Alloc
is often a suitable choice, there are many scenarios where it is not.I don’t think that a performance oriented API should be a performance trap. If we want people to be aware of the potential pitfalls of it, that’s something that
[UnsafeCallersOnly]
or similar should be for. As the API stands today, it has paths in it to handle the small case but to use it “correctly” (even in cases where it is actually needed), the devs realistically need to duplicate those same checks.I’m all for telling people “this is dangerous, be mindful of the holes”. However, I don’t think we should be intentionally making it more problematic to use correctly where there is a decently simple fix that can be made.
Quick standalone performance test:
Run using:
.NET 7 SDK Preview 6: 390ms Bleeding edge Roslyn (4.4.0-1.22371.1): 10899ms
As you can see, this change introduced 27x regression of the simple array initialization pattern used in this micro-benchmark.
BTW: I do not see any performance numbers in #62392 . In dotnet/runtime, we require performance numbers demonstrating the improvement for all performance optimization PRs. It is not unusual for performance optimizations to not have the intended effect.
(EDIT: Corrected the perf numbers.)
I disagree. We are using it correctly in our own CoreLib in StringBuilder, ArrayPool and number of other places. Number of these places were heavily micro-benchmarked when the uses of this API were introduced to make sure that there is no regression, and none of them have the checks duplicated.
I do not think that the fix is simple. The GC is heavily optimized for giving you zero-initialized memory. If we were to build uninitialized memory path that is as optimized as the existing zero-initialized path, it would be fairly complex change, and it would be hard to make it pay-for-play.
Submitted doc update at https://github.com/dotnet/dotnet-api-docs/pull/8249
Yes, we can do all that, but the benefits are vanishingly small.
GC.AllocateUninitializedArray
is also a security trap due to uninitialized data. I kind of like that the API is a performance trap since it forces people to use it only in places where it is actually needed.The IL for the method in question looks like this - there’s a GC::AllocateUninitializedArray instead of newarr. This is breaking pattern recognition in RyuJIT. In NativeAOT this is a
PlatformNotSupported
becauseRuntimeHelpers::InitializeArray
needs to be expanded as an intrinsic. In CoreCLR, breaking the pattern recognition is a deoptimization. Good thing we found it.Cc @jaredpar to make sure this is intentional new codegen for this.
RyuJIT will need to be adapted to recognize the new pattern.