syntect: fancy-regex: problems with patterns in some syntaxes

I just tried the fancy-regex version of syntect for the first time (thank you for everyone involved!).

Here are a few of the problems I have encountered so far:

  • The PHP syntax in sublimehq/Packages (master) uses some sort of inline comment:
        - match: |-
          (?x)/\*\*(?:
            (?:\#@\+)?\s*$ (?# multi-line doc )
            |
            (?=\s+@.*\s\*/\s*$) (?# inline doc )
          )
    
    which causes the following error:
    Error while compiling regex '…': Unknown group flag
    
  • The AsciiDoc syntax (https://github.com/SublimeText/AsciiDoc/blob/master/AsciiDoc.tmLanguage) fails with:
    Error while compiling regex '(?x)^
    (?= ([/+-.*_=]{4,})\s*(?m:$)
    | ([ \t]{1,})
    | [=]{1,6}\s*+
    | [ ]{,3}(?<marker>[-*_])([ ]{,2}\k<marker>){2,}[ \t]*+(?m:$)
    )': Unknown group flag
    
    (note that this is after .sublime-syntax conversion and possibly regex rewriting in syntect).
  • The ARM Assembly syntax (https://github.com/tvi/Sublime-ARM-Assembly) fails with a similar error:
    Error while compiling regex '(?x)
    ^\s*\#\s*(define)\s+             # define
    ((?<id>[a-zA-Z_][a-zA-Z0-9_]*))  # macro name
    (?:                              # and optionally:
        (\()                         # an open parenthesis
            (
                \s* \g<id> \s*       # first argument
                ((,) \s* \g<id> \s*)*  # additional arguments
                (?:\.\.\.)?          # varargs ellipsis?
            )
        (\))                         # a close parenthesis
    )?': Unknown group flag
    
  • The Haskell/Cabal syntax (https://github.com/SublimeHaskell/SublimeHaskell) fails with:
    Error while compiling regex '(=>|\u21D2)\s+([A-Z][\w']*)': Invalid escape
    
  • Elixir (https://github.com/princemaple/elixir-sublime-syntax/) fails with a similar error:
    Error while compiling regex '(?x)
    \\g(?:
      <( ((?>-[1-9]\d*|\d+) | [a-zA-Z_][a-zA-Z_\d]{,31}) | \g<-1>?([^\[\\(){}|^$.?*+\n]+) )> | '\g<1>' |
      (<([^\[\\(){}|^$.?*+\n]+*)>? | '\g<-1>'?) )': Invalid escape
    
  • Same for JavaScript/Babel (https://github.com/babel/babel-sublime):
    Error while compiling regex '(?x)
    (?:([_$a-zA-Z][$\w]*)\s*(=)\s*)?
    (?:\b(async)\s+)?
    (?=(\((?>(?>[^()]+)|\g<-1>)*\))\s*(=>))': Invalid escape
    
  • The SLS syntax fails with
    Error while compiling regex '(?x)
    (?: ^ [ \t]* | [ \t]+ )
    (?:(\#) \p{Print}* )?
    (\n|\z)
    ': Regex error: regex parse error:
        (?:^[ \t]*|[ \t]+)(?:(\#)\p{Print}*)?(\n|\z)
                                 ^^^^^^^^^
    error: Unicode property not found
    

This list continues for quite some time, but I’m not sure if it’s worth to list them all. Most of them seem to be related to “unknown group flag” or “invalid escape”.

Note: I just wanted to try this out within bat, there is absolutely no “pressure” to get this fixed (as always, of course 😄). I was just curious and thought this might help.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 15 (6 by maintainers)

Commits related to this issue

Most upvoted comments

fancy-regex 0.4.0 now supports named groups and backrefs, see changelog. \g and \G are not yet supported.

@Keats Published version 0.3.4 now: https://github.com/fancy-regex/fancy-regex/blob/master/CHANGELOG.md#034---2020-04-28

Note that it’s unlikely that I’ll implement \g<...> for subexp calls in the near future. I would recommend replacing them with {{variable}} references instead. Note that that might even make it better for Sublime Text itself, as I don’t think sregex implements that syntax either.

cc @robinst any thoughts on what might needed to be added to either the rewriter or fancy-regex to fix these and guesses as to how much work it would be?

Yeah, I think some of these should be easy to fix in fancy-regex, e.g. (?# ...) and \u, I’ll work on those first. Others are a bit more work like named capture groups, but on the radar: https://github.com/fancy-regex/fancy-regex/issues/34