bat: slow output, with pauses, with c# source code

Something strange is going on. On a 1500-line .cs file, bat takes ~7s to list the file. --pager none, both with and without -p (not much difference with -p). I can see a pause after every1-2 dozen lines are output. --colors never also doesn’t make a difference (bench via hyperfine):

Command Mean [ms] Min [ms] Max [ms] Relative
cmd /c type EntityComponentStore.cs 86.1 ± 7.3 75.9 101.1 1.0
bat --pager none EntityComponentStore.cs 7734.8 ± 586.6 6857.4 8700.1 89.9
bat --pager none -p EntityComponentStore.cs 6982.9 ± 464.3 6290.5 7889.4 81.1
bat --color never --pager none EntityComponentStore.cs 7155.5 ± 460.3 6586.1 7804.1 83.1
bat --color never --pager none -p EntityComponentStore.cs 7053.2 ± 492.3 6409.1 8076.1 82.0

bat 0.12.1, msvc build on Windows, in a regular cmd.exe console window.

I’d expect some slowdown but it seems like something buggy is happening. If I do the same on a cpp file, also ~1500 lines:

Command Mean [ms] Min [ms] Max [ms] Relative
cmd /c type ApiDispatchers.cpp 98.4 ± 10.0 83.6 115.9 1.0
bat --pager none ApiDispatchers.cpp 768.5 ± 88.2 677.9 995.4 7.8
bat --pager none -p ApiDispatchers.cpp 683.8 ± 128.9 549.4 959.0 6.9
bat --color never --pager none ApiDispatchers.cpp 480.1 ± 46.7 440.2 584.2 4.9
bat --color never --pager none -p ApiDispatchers.cpp 447.6 ± 42.5 414.6 542.0 4.5

Which is much more in line with what I’d expect. Is there something different about how the syntax highlighting for C# is handled? Is it all regexp or something else expensive based?

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (5 by maintainers)

Commits related to this issue

Most upvoted comments

may I suggest changing that pattern to \((?=(?:[^,)(]|\([^\)]*\))*,), it has the same functionality but without the catastrophic backtracking. I’m not sure when I’ll get time to open a PR on the sublimehq/Packages repo, so if this change works for you, feel free to beat me to it 😃

(I proved that the C# syntax tests aren’t adversely affected, nor is the performance in Sublime Text)

I’ve looked into this today.

First, I tried to minimize the example file above. I found that lines like this:

return a(bbbbbbbbbbbbbbbbbbbbbbbbbbbb.c(dddddddddddddddddddddddddddddddddddd));

take a very long time to parse. This particular one takes ~500 ms and the parsing time depends on the length of the b…b and d…d strings (in a somewhat complex way).

Next, I took a look at the C#.sublime-syntax. The problematic pattern is in the line_of_code_in_no_semicolon context:

    - match: \((?=(?:[^,)(]*|\([^\)]*\))*,)
      scope: punctuation.section.group.begin.cs
      push:
        # [...]

My (completely non-expert) guess would be that the nested (something*)* pattern causes some kind of combinatorial explosion.

Maybe @keith-hall could take a look at this (only if you are interested, of course)? 😊

If this pattern is disabled, C# files render reasonably fast:

bat CSharpCompilation.cs --color=always
Command Mean [s] Min [s] Max [s] Relative
master 9.555 ± 0.038 9.487 9.615 1.00
modified syntax 0.306 ± 0.006 0.297 0.319 1.00

Most likely it is another instance of a regex pattern used (perhaps often) in the C# syntax definition which has poor performance under the Oniguruma regex engine with many (perhaps simple) inputs. Probably the best way to start debugging this would be to play around in syntect to temporarily add some timers to see which patterns are problematic and then identify if there is a way to tweak those for better performance and make a PR to the sublimehq/Packages repository with those improvements. I will endeavor to do this when I am able to make time, but it may not be soon, so anyone else whom wishes to experiment/investigate: please do 😃

otherwise, potentially relevant (currently open) upstream issues are:

@vvuk Thank you for the detailed bug report!

If we consider the commands used for the C# file in the above table, the only times where the issue wasn’t present is when the simple printer is used (i.e. when the highlighter isn’t used at all).

Yes, exactly. Thank you for looking into this.

Given the above, I agree that this appears to be an issue specifically with C# syntax highlighting. Since I’m not too familiar with the Syntect library, @sharkdp is likely going to need to be the one to figure out the reason why this is happening.

There have been issues with the C# syntax in the past (e.g. https://github.com/trishume/syntect/issues/63), but this seems to be a new issue (we already include the updated C# syntax - at least on master, where the problem is still present).

This should probably be reported upstream. The same thing happens when using syncat, which is bundled with syntect.

@keith-hall Notifying you, just in case you are interested (please let me know if you are not, in which case I am sorry and will stop pinging you!)

I can confirm this bug on Windows 10 with bat 0.12.1. I can also confirm this on macOS.

The following behaviours have been identified with bat and the file provided:

Command Choppy
bat file.cs Yes
bat file.cs -pp Yes
bat file.cs --color=never Yes
bat file.cs --color=never > NUL No
bat file.cs --color=never --decorations=never Yes
bat file.cs --color=never --decorations=never > file.txt Yes
bat file.cs --color=always --decorations=always > file.txt Yes

Trying with a large file of a different language (e.g. the saved HTML from here), I don’t notice this issue in any of the above cases.

If we consider the commands used for the C# file in the above table, the only times where the issue wasn’t present is when the simple printer is used (i.e. when the highlighter isn’t used at all).

Given the above, I agree that this appears to be an issue specifically with C# syntax highlighting. Since I’m not too familiar with the Syntect library, @sharkdp is likely going to need to be the one to figure out the reason why this is happening.