runtime: Performance regression: 6x slower array allocation on Alpine
From the data that I got from @danmosemsft which was collected by running dotnet/performance microbenchmarks on alpine 3.11 via WSL2, it looks like allocating arrays of both value and reference types became 6 times slower compared to 3.1.
Initially, I thought that it was just an outlier, but I can see the same pattern for other collections that internally use arrays (queue, list, stack etc). The regression is specific to alpine. Ubuntu 18.04 (with and without WSL2) is fine.
@jkotas @janvorli who would be the best person to investigate that?
Repro
git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f netcoreapp3.1 netcoreapp5.0 --filter 'System.Collections.CtorGivenSize<Int32>.Array'
System.Collections.CtorGivenSize<Int32>.Array(Size: 512)
Conclusion | Base | Diff | Base/Diff | Modality | Operating System | Bit | Processor Name | Base Runtime | Diff Runtime |
---|---|---|---|---|---|---|---|---|---|
Same | 181.79 | 183.96 | 0.99 | Windows 10.0.18363.1016 | Arm | Microsoft SQ1 3.0 GHz | .NET Core 3.1.6 | 5.0.100-rc.1.20413.9 | |
Same | 92.89 | 94.47 | 0.98 | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | .NET Core 3.1.6 | 5.0.100-rc.1.20404.3 | |
Same | 96.05 | 94.36 | 1.02 | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | .NET Core 3.1.6 | 5.0.100-rc.1.20418.3 | |
Same | 114.74 | 111.94 | 1.03 | Windows 10.0.19041.450 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | .NET Core 3.1.6 | 5.0.100-rc.1.20413.9 | |
Same | 80.49 | 79.98 | 1.01 | Windows 10.0.19041.450 | X64 | Intel Core i7-6700 CPU 3.40GHz (Skylake) | .NET Core 3.1.6 | 5.0.100-rc.1.20419.9 | |
Same | 67.30 | 67.66 | 0.99 | bimodal | Windows 10.0.19042 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | .NET Core 3.1.6 | 5.0.100-rc.1.20418.3 |
Same | 86.10 | 79.17 | 1.09 | bimodal | Windows 10.0.19041.450 | X64 | Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) | .NET Core 3.1.6 | 5.0.100-rc.1.20419.14 |
Same | 97.50 | 98.77 | 0.99 | Windows 10.0.18363.959 | X86 | Intel Xeon CPU E5-1650 v4 3.60GHz | .NET Core 3.1.6 | 5.0.100-rc.1.20420.14 | |
Slower | 127.02 | 150.46 | 0.84 | bimodal | Windows 10.0.19041.450 | X86 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | .NET Core 3.1.6 | 5.0.100-rc.1.20419.5 |
Slower | 193.61 | 287.83 | 0.67 | bimodal | ubuntu 18.04 | Arm64 | Unknown processor | .NET Core 3.1.6 | 6.0.100-alpha.1.20421.6 |
Same | 99.85 | 103.42 | 0.97 | ubuntu 18.04 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | .NET Core 3.1.6 | 5.0.100-rc.1.20403.23 | |
Slower | 138.73 | 151.37 | 0.92 | macOS Mojave 10.14.5 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | .NET Core 3.1.6 | 5.0.100-rc.1.20404.2 | |
Slower | 72.85 | 515.56 | 0.14 | alpine 3.11 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | .NET Core 3.1.6 | 6.0.100-alpha.1.20421.6 | |
Slower | 78.85 | 90.76 | 0.87 | ubuntu 18.04 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | .NET Core 3.1.6 | 5.0.100-rc.1.20418.3 |
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 36 (36 by maintainers)
The container was missing libgdiplus on Alpine, but was able to resolve it by adding it from a
http://dl-3.alpinelinux.org/alpine/edge/testing/
.I was able to validate that the fix gets the perf to be comparable with 3.1:
We can also gather this information from the
/sys/devices/system/cpu
without doing Intel / AMD specific kung-fu. There is for example/sys/devices/system/cpu/cpu0/cache/index0/size
for level 1 cache of cpu0,/sys/devices/system/cpu/cpu0/cache/index1/size
for level 2 and/sys/devices/system/cpu/cpu0/cache/index2/size
for level 3.Adding Alpine is a good start, we can always re-evaluate if we find any other distro specific issues.
Others are better positioned to answer that one, off the top of my head I cannot remember Linux regressions that wouldn’t show up in either Ubuntu or Alpine.
imo, the better fix would be:
to keep the support for non-Linux Unix-like operating systems intact (macOS, FreeBSD, SunOS and since so forth).
It repros without WSL2 too.
The problem is that the GC is running like 100x more often than it should. It is likely problem in the budget computation. One of the places to check is
PAL_GetLogicalProcessorCacheSizeFromOS
.Alpine is different since it uses the
musl libc
instead of GNU, thus its different from other distros. For this particular fix constants such as_SC_LEVEL1_DCACHE_SIZE
are not defined for musl. Hence the fallback to using a difference method to retrieve the cache size was mostly alpine specific (there are several such subtle differences on Alpine).@danmosemsft Yes, we have an email discussion about adding additional OS coverage in perf lab. We will looks into @adamsitnik’s finalized report of exercises’ data, and I’m trying to get .NET Core OS usage telemetry data. Hope we can identify the commonly used OSes and add them to perf lab.
@danmosemsft Yeah it was very helpful in pinpointing where the issue might be. @adamsitnik yeah I will create a separate issue to track how we can add a test/asserts for this.
I investigated this more and it appears none of the
_SC_LEVEL1_DCACHE_SIZE
are defined for Alpine (musl), soPAL_GetLogicalProcessorCacheSizeFromOS
will always return 0.Wonder if https://github.com/dotnet/runtime/pull/34488 caused the regression, since I notice this case is missing when compared to 3.1