runtime: tiering makes regex-redux significantly slower
I noticed in the public numbers that source generated regexes (6.cs) are significantly slower than ref-emit compiled regexes (5.cs). This is even though source generated mode doesn’t have to emit and compile any code at runtime:
This doesn’t show up in Benchmark.NET, only on the command line. With default settings, generated is 16% slower,
C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "dotnet exec bin\release\net8.0\rr.dll gen"
Benchmark 1: dotnet exec bin\release\net8.0\rr.dll gen
Time (mean ± σ): 1.473 s ± 0.050 s [User: 1.159 s, System: 0.228 s]
Range (min … max): 1.383 s … 1.521 s 20 runs
C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "dotnet exec bin\release\net8.0\rr.dll reg"
Benchmark 1: dotnet exec bin\release\net8.0\rr.dll reg
Time (mean ± σ): 1.273 s ± 0.014 s [User: 0.967 s, System: 0.237 s]
Range (min … max): 1.262 s … 1.318 s 20 runs
disabling tiered compilation makes the difference go away, both are 25% faster, and generated is slightly faster than compiled, as expected:
C:\proj\rr>set DOTNET_TieredCompilation=0
C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "dotnet exec bin\release\net8.0\rr.dll gen"
Benchmark 1: dotnet exec bin\release\net8.0\rr.dll gen
Time (mean ± σ): 1.064 s ± 0.019 s [User: 0.733 s, System: 0.237 s]
Range (min … max): 1.045 s … 1.119 s 20 runs
C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "dotnet exec bin\release\net8.0\rr.dll reg"
Benchmark 1: dotnet exec bin\release\net8.0\rr.dll reg
Time (mean ± σ): 1.079 s ± 0.007 s [User: 0.758 s, System: 0.236 s]
Range (min … max): 1.068 s … 1.098 s 20 runs
Standalone repro, clone https://github.com/danmoseley/repro1.git and run repro.bat. hyperfine is there as a convenient way to benchmark an exe, feel free to use something else.
Is there any way to improve this, or is this just a limitation that shows up with a short lived app like this?
== what follows is not relevant to this issue but just for comparison == BTW, for what it’s worth here’s native AOT numbers. I guess the regular configuration here is actually forced to the interpreter.
C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe gen"
Benchmark 1: C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe gen
Time (mean ± σ): 1.094 s ± 0.037 s [User: 0.533 s, System: 0.335 s]
Range (min … max): 1.030 s … 1.161 s 20 runs
C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe reg"
Benchmark 1: C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe reg
Time (mean ± σ): 1.393 s ± 0.021 s [User: 0.888 s, System: 0.204 s]
Range (min … max): 1.363 s … 1.435 s 20 runs
and interpreter and nonbacktracking using nativeAOT. I don’t know why the interpreter is slower than “compiled” if the latter is using the interpreter as well.
C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe non"
Benchmark 1: C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe non
Time (mean ± σ): 3.514 s ± 0.026 s [User: 3.210 s, System: 0.207 s]
Range (min … max): 3.476 s … 3.576 s 20 runs
C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe nbt"
Benchmark 1: C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe nbt
Time (mean ± σ): 1.661 s ± 0.059 s [User: 1.353 s, System: 0.214 s]
Range (min … max): 1.606 s … 1.789 s 20 runs
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 26 (26 by maintainers)
Oops, added. But all it was doing was running the apps.
In a variety of places we assume “Compiled” isn’t literally “the only thing that’s different is emitting MSIL” but rather “you’re asking us to take more time to optimize throughput”, and as such there are optimizations performed when Compiled is set that aren’t related to emitting MSIL, like spending more time analyzing sets to determine the most optimal thing to search for as part of finding a starting position. I’d bet if you were to debug through you’d find the RegexFindOptimizations is different when you set Compiled vs None.