runtime: [Perf] Regressions in Integer Formatting
Run Information
| Architecture | arm64 |
|---|---|
| OS | Windows 10.0.19041 |
| Baseline | 4c7f4ee74d7a9e5bcc73b552dd020df02b039c8a |
| Compare | 8fb2eb53f75971ce492f14bebb93ed7a236bcd6e |
| Diff | Diff |
Regressions in System.Tests.Perf_Int32
| Benchmark | Baseline | Test | Test/Base | Test Quality | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
|---|---|---|---|---|---|---|---|---|---|
| ToString - Duration of single invocation | 24.70 ns | 34.44 ns | 1.39 | 0.21 | |||||
| TryFormat - Duration of single invocation | 19.39 ns | 21.12 ns | 1.09 | 0.17 |
Historical Data in Reporting System
Repro
git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Tests.Perf_Int32*'
Payloads
Histogram
System.Tests.Perf_Int32.ToString(value: 2147483647)
System.Tests.Perf_Int32.TryFormat(value: 2147483647)
Docs
Profiling workflow for dotnet/runtime repository Benchmarking workflow for dotnet/runtime repository
Run Information
| Architecture | arm64 |
|---|---|
| OS | Windows 10.0.19041 |
| Baseline | 4c7f4ee74d7a9e5bcc73b552dd020df02b039c8a |
| Compare | 8fb2eb53f75971ce492f14bebb93ed7a236bcd6e |
| Diff | Diff |
Regressions in System.Tests.Perf_UInt64
| Benchmark | Baseline | Test | Test/Base | Test Quality | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
|---|---|---|---|---|---|---|---|---|---|
| TryFormat - Duration of single invocation | 10.78 ns | 12.11 ns | 1.12 | 0.13 |
Historical Data in Reporting System
Repro
git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Tests.Perf_UInt64*'
Payloads
Histogram
System.Tests.Perf_UInt64.TryFormat(value: 12345)
Docs
Profiling workflow for dotnet/runtime repository Benchmarking workflow for dotnet/runtime repository
Run Information
| Architecture | arm64 |
|---|---|
| OS | Windows 10.0.19041 |
| Baseline | 4c7f4ee74d7a9e5bcc73b552dd020df02b039c8a |
| Compare | 8fb2eb53f75971ce492f14bebb93ed7a236bcd6e |
| Diff | Diff |
Regressions in System.Tests.Perf_Version
| Benchmark | Baseline | Test | Test/Base | Test Quality | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
|---|---|---|---|---|---|---|---|---|---|
| TryFormatL - Duration of single invocation | 96.02 ns | 103.50 ns | 1.08 | 0.15 |
Historical Data in Reporting System
Repro
git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Tests.Perf_Version*'
Payloads
Histogram
System.Tests.Perf_Version.TryFormatL
Docs
Profiling workflow for dotnet/runtime repository Benchmarking workflow for dotnet/runtime repository
Run Information
| Architecture | arm64 |
|---|---|
| OS | Windows 10.0.19041 |
| Baseline | 28e63279342bd2f6ca43442d864f487613a53bc9 |
| Compare | d49bcbe0441f5c954cddcbe28a222eb34917bcaf |
| Diff | Diff |
Regressions in System.Tests.Perf_UInt32
| Benchmark | Baseline | Test | Test/Base | Test Quality | Baseline IR | Compare IR | IR Ratio | Baseline ETL | Compare ETL |
|---|---|---|---|---|---|---|---|---|---|
| ToString - Duration of single invocation | 22.92 ns | 25.28 ns | 1.10 | 0.24 | |||||
| TryFormat - Duration of single invocation | 17.54 ns | 20.77 ns | 1.18 | 0.01 |
Historical Data in Reporting System
Repro
git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Tests.Perf_UInt32*'
Payloads
Histogram
System.Tests.Perf_UInt32.ToString(value: 4294967295)
System.Tests.Perf_UInt32.TryFormat(value: 4294967295)
Docs
Profiling workflow for dotnet/runtime repository Benchmarking workflow for dotnet/runtime repository
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 21 (20 by maintainers)
As a side note, while looking at the diff I noticed some other general improvements that could be made in the future:
can become
and
can become
lastly
can become
and probably want to move the constant out of the loop.
I think this is correct culprid, the loop has a long dependency chain off the value out of the
mul/umullwith little independent instructions so it’s bound by these two and64-bit mulhas twice the latency ofumull.64-bit mulhas a latency of4(3)andumullhas2(1)on Cortex-A76. Were you perhaps comparing the AArch32 instructions? Note that on AArch64mulandumullaren’t real instructions but are architectural aliases formaddandumaddlrespectively so you need to look at the latencies for those.The
mulseems to be unneeded though, both inputs are 32-bits so you should be able to just useumullthere as before.Test history implicate #52893. In particular the change was merged, reverted, and then re-merged, which matches the up-down-up seen here:
As for how to get diffs, it should be something like: