serenity: LibJS: Lexer reports incorrect column for tokens

When lexing a piece of JS source for the console widget, I loop over the tokens to add styling information. However, it seems that the column numbers reported by these tokens are incorrect. I have to do some special math to make them usable, namely subtracting by 2, but I’m not sure why. Even after that subtraction, the output is sometimes still wrong, such as the case of let x = 10. Perhaps this is an issue in the console output code, but I’m not sure what it is doing wrong.

@linusg I believe you are the person to tag for LibJS 😃

See this code from #2375.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 19 (19 by maintainers)

Most upvoted comments

diff --git a/Libraries/LibJS/Lexer.cpp b/Libraries/LibJS/Lexer.cpp
index dc094f4f5..f67d8ff9b 100644
--- a/Libraries/LibJS/Lexer.cpp
+++ b/Libraries/LibJS/Lexer.cpp
@@ -149,8 +149,12 @@ Lexer::Lexer(StringView source)

 void Lexer::consume()
 {
-    if (m_position >= m_source.length()) {
-        m_position = m_source.length() + 1;
+    if (m_position > m_source.length())
+        return;
+
+    if (m_position == m_source.length()) {
+        m_position++;
+        m_line_column++;
         m_current_char = EOF;
         return;
     }
diff --git a/Libraries/LibJS/Lexer.h b/Libraries/LibJS/Lexer.h
index e60c2dd65..71f22b053 100644
--- a/Libraries/LibJS/Lexer.h
+++ b/Libraries/LibJS/Lexer.h
@@ -55,11 +55,11 @@ private:
     bool match(char, char, char, char) const;

     StringView m_source;
-    size_t m_position = 0;
+    size_t m_position { 0 };
     Token m_current_token;
-    int m_current_char = 0;
-    size_t m_line_number = 1;
-    size_t m_line_column = 1;
+    int m_current_char { 0 };
+    size_t m_line_number { 1 };
+    size_t m_line_column { 0 };

     struct TemplateState {
         bool in_expr;

Ultimately there are two issues.

  1. m_line_column is initialized to 1 which leads to incorrect token line column numbers for tokens on the first line
  2. after reading EOF m_line_column isn’t incremented one last time

After playing whack-a-mole for a while I think the above patch finally fixes both problems.

(notice in the test program only the tokens in the first line are off)

Not true, last line has line_column = 0 for the CurlyClose token, that should be 1.

I believe this is because when reading the last token it gets the wrong length,

if (m_position >= m_source.length()) {
        m_position = m_source.length() + 1;
        m_current_char = EOF;
        return;
    }

when m_position is m_source.length() it adds 1 for some reason so the length (m_position - value_start) becomes 2 which makes line_column 0.

(notice in the test program only the tokens in the first line are off)

Not true, last line has line_column = 0 for the CurlyClose token, that should be 1.

@linusg I believe you are the person to tag for LibJS 😃

don’t have to, but I appreciate it 😃

I’ll have a look.