meriyah: String literals are incorrectly parsed
subj
module.exports = '\u0009\u000A\u000B\u000C\u000D\u0020\u00A0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u202F\u205F\u3000\u2028\u2029\uFEFF';
here is the source https://raw.githubusercontent.com/zloirock/core-js/master/packages/core-js/internals/whitespaces.js
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 25 (20 by maintainers)
I see multiple technical issues involved in this thread.
\u180e
Correct. Historically when MONGOLIAN VOWEL SEPARATOR was introduced, it was categorized as
Zs
(whitespace), later in 2013 it was changed toCf
(ref: https://www.unicode.org/L2/L2013/13004-vowel-sep-change.pdf) and published in Unicode version 7.0.Unfortunately such change will need decades to sync to every downstream projects of Unicode. So please file a bug on angular that
\u180e
should not be included inWS_CHARS
.weird looking texts on REPL
There are three red dots in the parsed
"value"
key of the string literal. They represents\u2028
,\u2029
and\ufeff
respectively. Meriyah REPL uses CodeMirror to pretty print the AST, which uses a\u2022
(Bullet) to represent a “special char”. So a red dot is printed.https://github.com/codemirror/CodeMirror/blob/01758b19565384414306816b43b5f35d81f039a3/src/line/line_data.js#L122
Note that when you copy from the AST, CodeMirror will send you the raw text, so you can compare it to the escaped version on your DevTools console (Yes, chrome DevTools also uses CodeMirror)
how it can break an app
I have no idea how a parser can break an app without generating the app code from the parsed AST. So I guess here is the process:
For example,
astring
is a generator that can print estree AST (generated by meriyah) to JavaScript codes. TypeScript has builtin parser and generator. One may also have their own generator.In this case it can break the app because there are
\u2028
\u2029
in the literal. When a generator is doing something likeThe generated code will break on legacy platforms because
\u2028
,\u2029
must be escaped in string literals prior to ES2019 (https://ecma-international.org/ecma-262/#sec-intro). Since\u2028
, and\u2029
are not printed as equivalent escaped form indecl.init.value
, the generator may print the unescaped characters to the source.To preserve the raw text of the string literal, you can pass
raw: true
to the meriyah option, which will append a"raw"
propertyThe generator may print the string literal using
decl.init.raw
. If you are using your own generator, please revise and usedecl.init.raw
.I’ll just make it clear as I found the original problem. All this stuff is borderline black magic so I think we all need to take a step back and appreciate for a second how hard this shit is and how big brainEd we all are. It’s basically computer science. Coming from a lowly angular developer.
I just want to build my angular app in ES5 as I have IE11-using customers. If I use meriyah, it breaks in this single and specific way. If I use ts, it builds fine but much slower. Can we focus on just solving this and moving forward pls