runtime: [Perf] Regressions in Integer Formatting
Run Information
Architecture | arm64 |
---|---|
OS | Windows 10.0.19041 |
Baseline | 4c7f4ee74d7a9e5bcc73b552dd020df02b039c8a |
Compare | 8fb2eb53f75971ce492f14bebb93ed7a236bcd6e |
Diff | Diff |
Regressions in System.Tests.Perf_Int32
Benchmark | Baseline | Test | Test/Base | Test Quality | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
---|---|---|---|---|---|---|---|---|---|
ToString - Duration of single invocation | 24.70 ns | 34.44 ns | 1.39 | 0.21 | |||||
TryFormat - Duration of single invocation | 19.39 ns | 21.12 ns | 1.09 | 0.17 |
Historical Data in Reporting System
Repro
git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Tests.Perf_Int32*'
Payloads
Histogram
System.Tests.Perf_Int32.ToString(value: 2147483647)
System.Tests.Perf_Int32.TryFormat(value: 2147483647)
Docs
Profiling workflow for dotnet/runtime repository Benchmarking workflow for dotnet/runtime repository
Run Information
Architecture | arm64 |
---|---|
OS | Windows 10.0.19041 |
Baseline | 4c7f4ee74d7a9e5bcc73b552dd020df02b039c8a |
Compare | 8fb2eb53f75971ce492f14bebb93ed7a236bcd6e |
Diff | Diff |
Regressions in System.Tests.Perf_UInt64
Benchmark | Baseline | Test | Test/Base | Test Quality | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
---|---|---|---|---|---|---|---|---|---|
TryFormat - Duration of single invocation | 10.78 ns | 12.11 ns | 1.12 | 0.13 |
Historical Data in Reporting System
Repro
git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Tests.Perf_UInt64*'
Payloads
Histogram
System.Tests.Perf_UInt64.TryFormat(value: 12345)
Docs
Profiling workflow for dotnet/runtime repository Benchmarking workflow for dotnet/runtime repository
Run Information
Architecture | arm64 |
---|---|
OS | Windows 10.0.19041 |
Baseline | 4c7f4ee74d7a9e5bcc73b552dd020df02b039c8a |
Compare | 8fb2eb53f75971ce492f14bebb93ed7a236bcd6e |
Diff | Diff |
Regressions in System.Tests.Perf_Version
Benchmark | Baseline | Test | Test/Base | Test Quality | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
---|---|---|---|---|---|---|---|---|---|
TryFormatL - Duration of single invocation | 96.02 ns | 103.50 ns | 1.08 | 0.15 |
Historical Data in Reporting System
Repro
git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Tests.Perf_Version*'
Payloads
Histogram
System.Tests.Perf_Version.TryFormatL
Docs
Profiling workflow for dotnet/runtime repository Benchmarking workflow for dotnet/runtime repository
Run Information
Architecture | arm64 |
---|---|
OS | Windows 10.0.19041 |
Baseline | 28e63279342bd2f6ca43442d864f487613a53bc9 |
Compare | d49bcbe0441f5c954cddcbe28a222eb34917bcaf |
Diff | Diff |
Regressions in System.Tests.Perf_UInt32
Benchmark | Baseline | Test | Test/Base | Test Quality | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
---|---|---|---|---|---|---|---|---|---|
ToString - Duration of single invocation | 22.92 ns | 25.28 ns | 1.10 | 0.24 | |||||
TryFormat - Duration of single invocation | 17.54 ns | 20.77 ns | 1.18 | 0.01 |
Historical Data in Reporting System
Repro
git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Tests.Perf_UInt32*'
Payloads
Histogram
System.Tests.Perf_UInt32.ToString(value: 4294967295)
System.Tests.Perf_UInt32.TryFormat(value: 4294967295)
Docs
Profiling workflow for dotnet/runtime repository Benchmarking workflow for dotnet/runtime repository
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 21 (20 by maintainers)
As a side note, while looking at the diff I noticed some other general improvements that could be made in the future:
can become
and
can become
lastly
can become
and probably want to move the constant out of the loop.
I think this is correct culprid, the loop has a long dependency chain off the value out of the
mul/umull
with little independent instructions so it’s bound by these two and64-bit mul
has twice the latency ofumull
.64-bit mul
has a latency of4(3)
andumull
has2(1)
on Cortex-A76. Were you perhaps comparing the AArch32 instructions? Note that on AArch64mul
andumull
aren’t real instructions but are architectural aliases formadd
andumaddl
respectively so you need to look at the latencies for those.The
mul
seems to be unneeded though, both inputs are 32-bits so you should be able to just useumull
there as before.Test history implicate #52893. In particular the change was merged, reverted, and then re-merged, which matches the up-down-up seen here:
As for how to get diffs, it should be something like: