runtime: [ARM64] Performance regression: Sorting arrays of primitive types
After running benchmarks for 3.1 vs 5.0 using “Ubuntu arm64 Qualcomm Machines” owned by the JIT Team, I’ve found few regressions related to sorting.
It can be related to the fact that we have moved the sorting implementation from native to managed code (cc @stephentoub).
@DrewScoggins is there any way to see the full historical data for ARM64?
Repro
git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f netcoreapp3.1 netcoreapp5.0 --architecture arm64 --filter 'System.Collections.Sort<Int32>.Array' 'System.Collections.Sort<Int32>.List'
System.Collections.Sort<Int32>.Array(Size: 512)
Result | Base | Diff | Ratio | Alloc Delta | Modality | Operating System | Bit | Processor Name | Base V | Diff V |
---|---|---|---|---|---|---|---|---|---|---|
Same | 4500.17 | 4250.27 | 1.06 | +0 | Windows 10.0.19041.388 | X64 | AMD Ryzen 9 3900X | 3.1.6 | 5.0.20.41714 | |
Faster | 4129.79 | 3573.10 | 1.16 | +0 | several? | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | 3.1.6 | 5.0.20.40203 |
Faster | 4197.36 | 3672.08 | 1.14 | +0 | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | 3.1.6 | 5.0.20.40416 | |
Faster | 5247.31 | 4572.02 | 1.15 | +0 | multimodal | Windows 10.0.19041.450 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | 3.1.6 | 5.0.20.40416 |
Same | 6560.47 | 6483.01 | 1.01 | +0 | bimodal | Windows 10.0.19041.450 | X64 | Intel Core i7-6700 CPU 3.40GHz (Skylake) | 3.1.6 | 5.0.20.40416 |
Same | 3315.74 | 3404.51 | 0.97 | +0 | Windows 10.0.19042 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | 3.1.6 | 5.0.20.40416 | |
Slower | 4821.34 | 8602.70 | 0.56 | +0 | several? | Windows 10.0.19041.450 | X64 | Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) | 3.1.6 | 5.0.20.41714 |
Faster | 6214.65 | 3992.76 | 1.56 | +0 | ubuntu 18.04 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | 3.1.6 | 5.0.20.40203 | |
Faster | 6441.58 | 4187.34 | 1.54 | +0 | manjaro | X64 | Intel Core i7-4771 CPU 3.50GHz (Haswell) | 3.1.6 | 5.0.20.41714 | |
Same | 5831.01 | 5518.00 | 1.06 | +0 | bimodal | pop 20.04 | X64 | Intel Core i7-6600U CPU 2.60GHz (Skylake) | 3.1.6 | 5.0.20.41714 |
Same | 5602.64 | 5486.66 | 1.02 | +0 | bimodal | alpine 3.11 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | 3.1.6 | 5.0.20.41714 |
Same | 6178.62 | 5918.08 | 1.04 | +0 | bimodal | ubuntu 18.04 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | 3.1.6 | 5.0.20.40416 |
Slower | 16288.81 | 27067.03 | 0.60 | +0 | bimodal | ubuntu 16.04 | Arm64 | Unknown processor | 3.1.6 | 5.0.20.41714 |
Slower | 16150.43 | 27030.75 | 0.60 | +0 | bimodal | ubuntu 16.04 | Arm64 | Unknown processor | 3.1.7 | 5.0.20.41714 |
Slower | 16139.02 | 26145.59 | 0.62 | +0 | ubuntu 16.04 | Arm64 | Unknown processor | 3.1.6 | 5.0.20.41714 | |
Slower | 14011.07 | 17551.48 | 0.80 | +0 | ubuntu 18.04 | Arm64 | Unknown processor | 3.1.6 | 5.0.20.41714 | |
Faster | 7375.30 | 4817.14 | 1.53 | +0 | several? | Windows 10.0.19041.450 | X86 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | 3.1.6 | 5.0.20.40416 |
Same | 7244.05 | 7548.09 | 0.96 | +0 | several? | Windows 10.0.18363.1016 | Arm | Microsoft SQ1 3.0 GHz | 3.1.6 | 5.0.20.40416 |
Faster | 7932.56 | 5120.47 | 1.55 | +0 | macOS Catalina 10.15.6 | X64 | Intel Core i5-4278U CPU 2.60GHz (Haswell) | 3.1.6 | 5.0.20.41714 | |
Faster | 7060.74 | 4554.61 | 1.55 | +0 | macOS Catalina 10.15.6 | X64 | Intel Core i7-4870HQ CPU 2.50GHz (Haswell) | 3.1.6 | 5.0.20.41714 | |
Faster | 7145.76 | 4761.73 | 1.50 | +0 | macOS Mojave 10.14.5 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | 3.1.6 | 5.0.20.40203 |
System.Collections.Sort<Int32>.List(Size: 512)
Result | Base | Diff | Ratio | Alloc Delta | Modality | Operating System | Bit | Processor Name | Base V | Diff V |
---|---|---|---|---|---|---|---|---|---|---|
Faster | 4520.64 | 4071.51 | 1.11 | +0 | Windows 10.0.19041.388 | X64 | AMD Ryzen 9 3900X | 3.1.6 | 5.0.20.41714 | |
Faster | 4332.89 | 3567.66 | 1.21 | +0 | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | 3.1.6 | 5.0.20.40203 | |
Faster | 6967.61 | 3518.82 | 1.98 | +0 | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | 3.1.6 | 5.0.20.40416 | |
Faster | 5015.13 | 4448.40 | 1.13 | +0 | bimodal | Windows 10.0.19041.450 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | 3.1.6 | 5.0.20.40416 |
Same | 7293.88 | 7863.56 | 0.93 | +0 | several? | Windows 10.0.19041.450 | X64 | Intel Core i7-6700 CPU 3.40GHz (Skylake) | 3.1.6 | 5.0.20.40416 |
Same | 3339.08 | 3267.09 | 1.02 | +0 | Windows 10.0.19042 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | 3.1.6 | 5.0.20.40416 | |
Slower | 4806.28 | 8450.61 | 0.57 | +0 | bimodal | Windows 10.0.19041.450 | X64 | Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) | 3.1.6 | 5.0.20.41714 |
Faster | 6222.44 | 3938.96 | 1.58 | +0 | ubuntu 18.04 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | 3.1.6 | 5.0.20.40203 | |
Faster | 6146.14 | 4254.57 | 1.44 | +0 | manjaro | X64 | Intel Core i7-4771 CPU 3.50GHz (Haswell) | 3.1.6 | 5.0.20.41714 | |
Same | 5834.00 | 5493.04 | 1.06 | +0 | pop 20.04 | X64 | Intel Core i7-6600U CPU 2.60GHz (Skylake) | 3.1.6 | 5.0.20.41714 | |
Same | 5571.07 | 6275.42 | 0.89 | +0 | several? | alpine 3.11 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | 3.1.6 | 5.0.20.41714 |
Same | 6217.53 | 6206.80 | 1.00 | +0 | multimodal | ubuntu 18.04 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | 3.1.6 | 5.0.20.40416 |
Slower | 16281.08 | 27217.40 | 0.60 | +0 | bimodal | ubuntu 16.04 | Arm64 | Unknown processor | 3.1.6 | 5.0.20.41714 |
Slower | 16329.85 | 27111.99 | 0.60 | +0 | ubuntu 16.04 | Arm64 | Unknown processor | 3.1.7 | 5.0.20.41714 | |
Slower | 16395.71 | 26771.27 | 0.61 | +0 | ubuntu 16.04 | Arm64 | Unknown processor | 3.1.6 | 5.0.20.41714 | |
Slower | 7791.41 | 10397.92 | 0.75 | +0 | several? | ubuntu 18.04 | Arm64 | Unknown processor | 3.1.6 | 5.0.20.41714 |
Faster | 7371.80 | 4824.34 | 1.53 | +0 | Windows 10.0.19041.450 | X86 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | 3.1.6 | 5.0.20.40416 | |
Same | 7002.96 | 7065.54 | 0.99 | +0 | several? | Windows 10.0.18363.1016 | Arm | Microsoft SQ1 3.0 GHz | 3.1.6 | 5.0.20.40416 |
Faster | 7976.12 | 5098.72 | 1.56 | +0 | macOS Catalina 10.15.6 | X64 | Intel Core i5-4278U CPU 2.60GHz (Haswell) | 3.1.6 | 5.0.20.41714 | |
Faster | 7114.58 | 4576.65 | 1.55 | +0 | macOS Catalina 10.15.6 | X64 | Intel Core i7-4870HQ CPU 2.50GHz (Haswell) | 3.1.6 | 5.0.20.41714 | |
Faster | 7157.62 | 4735.35 | 1.51 | +0 | macOS Mojave 10.14.5 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | 3.1.6 | 5.0.20.40203 |
/cc @JulieLeeMSFT
category:cq theme:needs-triage skill-level:expert cost:large
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 28 (28 by maintainers)
It looks like we regained the missing perf with #43870.
6.0 data now shows about 16 us
which is roughly where we were in the 3.1 timeframe.
The List case shows similar improvements
Going to close this one as fixed.
Linux arm64. Probably Windows arm64 too, but we don’t track that.
I can repro this on an Rpi4:
What’s the difference between the two tables? Is it just two different sets of runs?
Never mind – List vs Array.