TypeScript: Compiler should error when encountering invalid regular expressions
In tests/cases/conformance/parser/ecmascript5/RegressionTests/parser579071.ts we have an invalid regex:
var x = /fo(o/;
This is not valid JavaScript, but we don’t give any errors. It would be helpful to let users know when their regular expressions are invalid.
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Reactions: 2
- Comments: 15 (10 by maintainers)
@zm-cttae @RyanCavanaugh There are already attempts like #4387 and #35957 using this approach but was closed. And this is definitely not a good solution, because:
Luckily a simple parser without node generation has little effect on the performance, and I plan to make my PR available at the end of this month.
Besides I have a follow-up proposal to further enforce type safety of RegExp-related methods and enhance UX (providing auto-completion) that make use of the implementation, but that would be another separate issue and PR afterwards.
Imbalanced parens / bracket seems like where 99% of the value is.
Duplicate flags seems like an error no one’s ever made before; usually its
/gor/m,/gmi, etc, I can’t imagine writing/mgmunless a cat walked on my keyboard.Erroring on escapes that are invalid regardless of flags seems like a fine compromise.
IMO it’s really actually fine if once a year your program unconditionally crashes on startup in the cases where you made an extremely rare mistake. The value is in flagging errors that are made every day.
Moved from #54744:
Previously worked on #51837, I found that TypeScript gives almost no syntax errors for regular expressions. I would like to file a PR about it.
Something I would like to do are:
uorvflag, i.e. inUnicodeMode, that means if we encounter auorvflag we will need to rescan the whole RegExp again (!!) (i.e. redoing what is done in the currentreScanSlashTokenmethod) And to check for invalidDecimalEscapes and k<GroupName>s we will also need to count the number of capture groups and record the names of all named capture groups along the way.Am I doing too much or too less? I know doing too much may cause serious performace regressions (well, luckily regular expression literals are not that common compared with string literals). It should be better than doing nothing after all though.
Surely this is a case of attempting to reserialise the raw regexp to the string representation then into RegExp class? If it blows up, the string is invalid! That should be fast too.
The downstream tools can re-parse if they really need that data.
We’re very sensitive to perf papercuts and not likely to accept the feature if the perf cost is nontrivial, and allocating more objects is something that is likely to incur broad perf hits due to slowing down GC, etc…
I feel like we should be able to do a parse-only pass (i.e. just scan and descend in order to validate) of the regex without creating nodes as you go, since there’s no consumer of that output, just the production of errors as a side effect