linguist: does not work syntax highlighting when used with \b and the Russian alphabet
Good afternoon. Create syntax highlighting for 1C:Enterprise, which supports the Russian key words.
When we use \b(Если|If)\b
in github https://github-lightshow.herokuapp.com keywords are not highlighted. Do like this (?<=[^\w-а-яё\.]|^)(Если|If)(?=[^\w-а-яё\.]|$)
works. Files in UTF-8.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 48 (21 by maintainers)
Ah right, that makes sense then. I wasn’t aware the flag imposed performance penalties (at least as far as the PCRE library is concerned).
I’d say your proposed solution is a good compromise. =) We could also only enable it for languages which use Unicode-sensitive grammars for highlighting, too (like 1C Enterprise’s one does). Personally, I feel that’s a more conservative approach; we could add a new option to
languages.yml
likeunicode_pcre: true
or something.@worldbeater I doubt the PCRE version has anything to do with this. It’s running in ASCII-only mode, that’s all I know (I’m not staff). But of check the manpage for
pcresyntax(3)
, you’ll find documentation for variances between regular expression engines (starting at the section “Backreferences”).These are the likely discrepancies that that’re affecting the C# grammar (indeed, most TextMate grammars which use Oniguruma extensions):
(?R)
(?n)
(?+n)
(?-n)
(?&name)
(?P>name)
\g<name>
\g'name'
\g<n>
\g'n'
\g<+n>
\g'+n'
\g<-n>
\g'-n'
Hahaha. I can answer this: we don’t currently use the
PCRE_UCP
flag because it’s a really significant performance degradation. I acknowledge this is not an ideal answer – I’m looking into the option to only enabling this flag on documents that we know contain extended Unicode characters, but we can’t enable it by default because it really slows down syntax highlighting. 😢I don’t have an ETA but I promise we plan to look into improving highlighting for non-English, non-ASCII documents.
(apologies in advance, i don’t intent to hijack this conversation)
In C# too, we are struggling to have C# 7.2 syntax support due to the fact that upstream has moved to Oniguruma grammer. @damieng has chalked out some ideas at https://github.com/atom/language-csharp/issues/112#issuecomment-379094384 by which we can try to make progress.
Since Oniguruma is mentioned in this thread, how feasible is it for linguist to support multiple engines? Is it a too huge effort, or totally out of the scope of this project?
Don’t screw it up. The Russians are watching.
@worldbeater, PCRE supports every feature that Oniguruma does, it just uses different syntax. The perceived difference in regex support you see on GitHub is a consequence of grammar authors using Oniguruma-specific syntax instead of PCRE. This happens because Oniguruma is used by every editor which supports TextMate grammars – GitHub is a lone exception due to its use of PCRE.
If the engines were reversed, you’d be asking us to provide support for PCRE’s “complicated syntax” too, because the Oniguruma engine isn’t good enough.
One more ping, guys. (sorry)
i think this issue should be opened. the problem still exists.
Just to clarify, is there any chance that someone on GitHub will come and replace the good old parser with a newer one, providing support for Oniguruma and other complicated regex syntaxes?
C# language highlighting on GitHub is absolutely disgusting and it would be nice to figure out how this can be fixed in the nearest future. Other engines like BitBucket or GitLab provide a much better highlighting, sad.
Thanks in advance!
That may be the right approach but I’d rather wait until we have a solid plan for supporting something like a language flag before going ahead and adding this to the
languages.yml
file.this is a good variant. @vmg @arfon is it possible to make that thing?
@vmg may be able to answer this
Okay… Is it easy to change one engine to another? Can we just ask the Github guys to make this change?