TypeScript: Compiler should error when encountering invalid regular expressions

In tests/cases/conformance/parser/ecmascript5/RegressionTests/parser579071.ts we have an invalid regex:

var x = /fo(o/;

This is not valid JavaScript, but we don’t give any errors. It would be helpful to let users know when their regular expressions are invalid.

About this issue

Original URL
State: closed
Created 9 years ago
Reactions: 2
Comments: 15 (10 by maintainers)

Most upvoted comments

@zm-cttae @RyanCavanaugh There are already attempts like #4387 and #35957 using this approach but was closed. And this is definitely not a good solution, because:

If there are any errors, the whole RegExp expression is underlined, it’s not useful especially when the expression is long. Plus, the platform-specified error messages are not always clear, and we can’t translate them into other locales.
There might be multiple errors in the RegExp, but only the first error is revealed.
TypeScript is not limited to being executed with Node.js. It may also be run, for example, in a web browser via monaco-editor. That means by using the built-in RegExp constructor the behavior is not guaranteed and may differ due to features implemented in the JS engine. Actually, I plan to include some features that are currently Stage 3 proposals into my implementation in advance.

Luckily a simple parser without node generation has little effect on the performance, and I plan to make my PR available at the end of this month.

Besides I have a follow-up proposal to further enforce type safety of RegExp-related methods and enhance UX (providing auto-completion) that make use of the implementation, but that would be another separate issue and PR afterwards.

graphemecluster on Jul 7, 2023

Imbalanced parens / bracket seems like where 99% of the value is.

Duplicate flags seems like an error no one’s ever made before; usually its /g or /m, /gmi, etc, I can’t imagine writing /mgm unless a cat walked on my keyboard.

Erroring on escapes that are invalid regardless of flags seems like a fine compromise.

IMO it’s really actually fine if once a year your program unconditionally crashes on startup in the cases where you made an extremely rare mistake. The value is in flagging errors that are made every day.

RyanCavanaugh on Jun 23, 2023

Moved from #54744:

Previously worked on #51837, I found that TypeScript gives almost no syntax errors for regular expressions. I would like to file a PR about it.

Something I would like to do are:

Check for duplicated or unknown flags
Check for unbalanced parentheses, which is the most common mistake people make
Check for invalid escapes But this should be done only for RegExps with u or v flag, i.e. in UnicodeMode, that means if we encounter a u or v flag we will need to rescan the whole RegExp again (!!) (i.e. redoing what is done in the current reScanSlashToken method) And to check for invalid DecimalEscapes and k<GroupName>s we will also need to count the number of capture groups and record the names of all named capture groups along the way.

Am I doing too much or too less? I know doing too much may cause serious performace regressions (well, luckily regular expression literals are not that common compared with string literals). It should be better than doing nothing after all though.

graphemecluster on Jun 24, 2023

Surely this is a case of attempting to reserialise the raw regexp to the string representation then into RegExp class? If it blows up, the string is invalid! That should be fast too.

zm-cttae on Jul 7, 2023

The downstream tools can re-parse if they really need that data.

We’re very sensitive to perf papercuts and not likely to accept the feature if the perf cost is nontrivial, and allocating more objects is something that is likely to incur broad perf hits due to slowing down GC, etc…

RyanCavanaugh on Jun 28, 2023

I feel like we should be able to do a parse-only pass (i.e. just scan and descend in order to validate) of the regex without creating nodes as you go, since there’s no consumer of that output, just the production of errors as a side effect

RyanCavanaugh on Jun 27, 2023