runtime: tiering makes regex-redux significantly slower

I noticed in the public numbers that source generated regexes (6.cs) are significantly slower than ref-emit compiled regexes (5.cs). This is even though source generated mode doesn’t have to emit and compile any code at runtime:

This doesn’t show up in Benchmark.NET, only on the command line. With default settings, generated is 16% slower,

C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "dotnet exec bin\release\net8.0\rr.dll gen"
Benchmark 1: dotnet exec bin\release\net8.0\rr.dll gen
  Time (mean ± σ):      1.473 s ±  0.050 s    [User: 1.159 s, System: 0.228 s]
  Range (min … max):    1.383 s …  1.521 s    20 runs

C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "dotnet exec bin\release\net8.0\rr.dll reg"
Benchmark 1: dotnet exec bin\release\net8.0\rr.dll reg
  Time (mean ± σ):      1.273 s ±  0.014 s    [User: 0.967 s, System: 0.237 s]
  Range (min … max):    1.262 s …  1.318 s    20 runs

disabling tiered compilation makes the difference go away, both are 25% faster, and generated is slightly faster than compiled, as expected:

C:\proj\rr>set DOTNET_TieredCompilation=0

C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "dotnet exec bin\release\net8.0\rr.dll gen"
Benchmark 1: dotnet exec bin\release\net8.0\rr.dll gen
  Time (mean ± σ):      1.064 s ±  0.019 s    [User: 0.733 s, System: 0.237 s]
  Range (min … max):    1.045 s …  1.119 s    20 runs

C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "dotnet exec bin\release\net8.0\rr.dll reg"
Benchmark 1: dotnet exec bin\release\net8.0\rr.dll reg
  Time (mean ± σ):      1.079 s ±  0.007 s    [User: 0.758 s, System: 0.236 s]
  Range (min … max):    1.068 s …  1.098 s    20 runs

Standalone repro, clone https://github.com/danmoseley/repro1.git and run repro.bat. hyperfine is there as a convenient way to benchmark an exe, feel free to use something else.

Is there any way to improve this, or is this just a limitation that shows up with a short lived app like this?

== what follows is not relevant to this issue but just for comparison == BTW, for what it’s worth here’s native AOT numbers. I guess the regular configuration here is actually forced to the interpreter.

C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe gen"
Benchmark 1: C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe gen
  Time (mean ± σ):      1.094 s ±  0.037 s    [User: 0.533 s, System: 0.335 s]
  Range (min … max):    1.030 s …  1.161 s    20 runs

C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe reg"
Benchmark 1: C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe reg
  Time (mean ± σ):      1.393 s ±  0.021 s    [User: 0.888 s, System: 0.204 s]
  Range (min … max):    1.363 s …  1.435 s    20 runs

and interpreter and nonbacktracking using nativeAOT. I don’t know why the interpreter is slower than “compiled” if the latter is using the interpreter as well.

C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe non"
Benchmark 1: C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe non
  Time (mean ± σ):      3.514 s ±  0.026 s    [User: 3.210 s, System: 0.207 s]
  Range (min … max):    3.476 s …  3.576 s    20 runs

C:\proj\rr>\t\hyperfine\hyperfine.exe --warmup 1 --min-runs 20 "C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe nbt"
Benchmark 1: C:\proj\rr\bin\release\net8.0\win-x64\publish\rr.exe nbt
  Time (mean ± σ):      1.661 s ±  0.059 s    [User: 1.353 s, System: 0.214 s]
  Range (min … max):    1.606 s …  1.789 s    20 runs

About this issue

Original URL
State: open
Created a year ago
Comments: 26 (26 by maintainers)

Most upvoted comments

Oops, added. But all it was doing was running the apps.

danmoseley on Jun 19, 2023

any idea why RegexOptions.Compiled under native AOT is a lot faster than RegexOptions.None

In a variety of places we assume “Compiled” isn’t literally “the only thing that’s different is emitting MSIL” but rather “you’re asking us to take more time to optimize throughput”, and as such there are optimizations performed when Compiled is set that aren’t related to emitting MSIL, like spending more time analyzing sets to determine the most optimal thing to search for as part of finding a starting position. I’d bet if you were to debug through you’d find the RegexFindOptimizations is different when you set Compiled vs None.

stephentoub on Jun 19, 2023