bat: slow output, with pauses, with c# source code
Something strange is going on. On a 1500-line .cs file, bat takes ~7s to list the file. --pager none, both with and without -p (not much difference with -p). I can see a pause after every1-2 dozen lines are output. --colors never also doesn’t make a difference (bench via hyperfine):
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
cmd /c type EntityComponentStore.cs |
86.1 ± 7.3 | 75.9 | 101.1 | 1.0 |
bat --pager none EntityComponentStore.cs |
7734.8 ± 586.6 | 6857.4 | 8700.1 | 89.9 |
bat --pager none -p EntityComponentStore.cs |
6982.9 ± 464.3 | 6290.5 | 7889.4 | 81.1 |
bat --color never --pager none EntityComponentStore.cs |
7155.5 ± 460.3 | 6586.1 | 7804.1 | 83.1 |
bat --color never --pager none -p EntityComponentStore.cs |
7053.2 ± 492.3 | 6409.1 | 8076.1 | 82.0 |
bat 0.12.1, msvc build on Windows, in a regular cmd.exe console window.
I’d expect some slowdown but it seems like something buggy is happening. If I do the same on a cpp file, also ~1500 lines:
| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|---|---|---|---|---|
cmd /c type ApiDispatchers.cpp |
98.4 ± 10.0 | 83.6 | 115.9 | 1.0 |
bat --pager none ApiDispatchers.cpp |
768.5 ± 88.2 | 677.9 | 995.4 | 7.8 |
bat --pager none -p ApiDispatchers.cpp |
683.8 ± 128.9 | 549.4 | 959.0 | 6.9 |
bat --color never --pager none ApiDispatchers.cpp |
480.1 ± 46.7 | 440.2 | 584.2 | 4.9 |
bat --color never --pager none -p ApiDispatchers.cpp |
447.6 ± 42.5 | 414.6 | 542.0 | 4.5 |
Which is much more in line with what I’d expect. Is there something different about how the syntax highlighting for C# is handled? Is it all regexp or something else expensive based?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (5 by maintainers)
Commits related to this issue
- [C#] Avoid catastrophic backtracking This simplifies a pattern in the C# syntax while leaving it functionally unchanged. The old pattern susceptible to catastrophic backtracking due to a combinatori... — committed to sharkdp/Packages by sharkdp 4 years ago
- [C#] Avoid catastrophic backtracking This simplifies a pattern in the C# syntax while leaving it functionally unchanged. The old is pattern susceptible to catastrophic backtracking due to a combinat... — committed to sharkdp/Packages by sharkdp 4 years ago
- [C#] Avoid catastrophic backtracking This simplifies a pattern in the C# syntax while leaving it functionally unchanged. The old pattern is susceptible to catastrophic backtracking due to a combinat... — committed to sharkdp/Packages by sharkdp 4 years ago
- [C#] Avoid catastrophic backtracking (#2331) This simplifies a pattern in the C# syntax while leaving it functionally unchanged. The old pattern is susceptible to catastrophic backtracking due to ... — committed to sublimehq/Packages by sharkdp 4 years ago
- [C#] Avoid catastrophic backtracking (#2331) This simplifies a pattern in the C# syntax while leaving it functionally unchanged. The old pattern is susceptible to catastrophic backtracking due to ... — committed to mitranim/Packages by sharkdp 4 years ago
may I suggest changing that pattern to
\((?=(?:[^,)(]|\([^\)]*\))*,), it has the same functionality but without the catastrophic backtracking. I’m not sure when I’ll get time to open a PR on the sublimehq/Packages repo, so if this change works for you, feel free to beat me to it 😃(I proved that the C# syntax tests aren’t adversely affected, nor is the performance in Sublime Text)
I’ve looked into this today.
First, I tried to minimize the example file above. I found that lines like this:
take a very long time to parse. This particular one takes ~500 ms and the parsing time depends on the length of the
b…bandd…dstrings (in a somewhat complex way).Next, I took a look at the
C#.sublime-syntax. The problematic pattern is in theline_of_code_in_no_semicoloncontext:My (completely non-expert) guess would be that the nested
(something*)*pattern causes some kind of combinatorial explosion.Maybe @keith-hall could take a look at this (only if you are interested, of course)? 😊
If this pattern is disabled, C# files render reasonably fast:
Most likely it is another instance of a regex pattern used (perhaps often) in the C# syntax definition which has poor performance under the Oniguruma regex engine with many (perhaps simple) inputs. Probably the best way to start debugging this would be to play around in syntect to temporarily add some timers to see which patterns are problematic and then identify if there is a way to tweak those for better performance and make a PR to the sublimehq/Packages repository with those improvements. I will endeavor to do this when I am able to make time, but it may not be soon, so anyone else whom wishes to experiment/investigate: please do 😃
otherwise, potentially relevant (currently open) upstream issues are:
@vvuk Thank you for the detailed bug report!
Yes, exactly. Thank you for looking into this.
There have been issues with the C# syntax in the past (e.g. https://github.com/trishume/syntect/issues/63), but this seems to be a new issue (we already include the updated C# syntax - at least on
master, where the problem is still present).This should probably be reported upstream. The same thing happens when using
syncat, which is bundled withsyntect.@keith-hall Notifying you, just in case you are interested (please let me know if you are not, in which case I am sorry and will stop pinging you!)
I can confirm this bug on Windows 10 with bat
0.12.1. I can also confirm this on macOS.The following behaviours have been identified with bat and the file provided:
bat file.csbat file.cs -ppbat file.cs --color=neverbat file.cs --color=never > NULbat file.cs --color=never --decorations=neverbat file.cs --color=never --decorations=never > file.txtbat file.cs --color=always --decorations=always > file.txtTrying with a large file of a different language (e.g. the saved HTML from here), I don’t notice this issue in any of the above cases.
If we consider the commands used for the C# file in the above table, the only times where the issue wasn’t present is when the simple printer is used (i.e. when the highlighter isn’t used at all).
Given the above, I agree that this appears to be an issue specifically with C# syntax highlighting. Since I’m not too familiar with the Syntect library, @sharkdp is likely going to need to be the one to figure out the reason why this is happening.