lark: Working grammar on Lark 0.9.0 throws IndexError on post-0.9.0

Describe the bug

My parser runs on Lark 0.9.0, but throws IndexError on both Lark 0.10.1 and master branch. The parser is Earley and uses the same custom indenter as for Python examples.

I reported this bug earlier in Gitter, sharing a stack dump and some local var values, but did not yet have a reduced example. Now I do:

To Reproduce

Run with python <filename>:

https://pastebin.com/HMA4kXyz

"""
IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py

Works in Lark 0.9, but not 0.10.1 or github master on 2020-11-12
Python 3.8.5 64-bit on Win 10 Pro

python IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py

Expected output is:
    code: ...
    parser: ...
    tree: ...
    Done.

Works on Lark 0.9.0
On Lark 0.10.1 or latest master branch, raises IndexError:
"""
import lark
import lark.indenter
code = '''\
if a is b
    print(a)
'''
# improper subset of one of the Python grammar examples:
grammar = r"""
file_input: (_NEWLINE | stmt)*

?stmt: simple_stmt | compound_stmt
?simple_stmt: small_stmt (";" small_stmt)* [";"] _NEWLINE
?small_stmt: (NAME test)

?compound_stmt: if_stmt
if_stmt: "if" test ":"? suite
suite: (","? simple_stmt) | (_NEWLINE _INDENT stmt+ _DEDENT)

?test: binary_bool_test
?binary_bool_test: not_test (("and" | "or") not_test)*
?not_test: "not" not_test -> not | comparison
?comparison: expr (_comp_op expr)*
star_expr: "*" expr
?expr: shift_expr (("bor" | "band" | "bxor") shift_expr)*
?shift_expr: arith_expr (_shift_op arith_expr)*
?arith_expr: term (_add_op term)*
?term: factor (_mul_op factor)*
?factor: _factor_op factor | atom_expr

_factor_op: "+" | "-"
_add_op:    "+" | "-"
_shift_op:  "<<" | ">>"
_mul_op:    "*" | "/" | "%" | "//"
_comp_op:   "<" | ">" | "==" | ">=" | "<=" | "<>" | "in" | "is"

?atom_expr: atom_expr "(" [arguments] ")" -> func_call
          | atom

?atom: "(" test ")"
     | name
     | number

name: NAME
?number: INT_LITERAL_DEC

?testlist_comp: (test|star_expr) [("," (test|star_expr))+ [","] | ","]
exprlist: (expr|star_expr) ("," (expr|star_expr))* [","]
testlist: test ("," test)* [","]

arguments: argvalue ("," argvalue)*  ("," [starargs | kwargs])?
         | starargs
         | kwargs

starargs: "*" test ("," "*" test)* ("," argvalue)* ["," kwargs]
kwargs: "**" test

?argvalue: test ("=" test)?

NAME: /[a-zA-Z_]\w*/
_NEWLINE: ( /\r?\n[\t ]*/  )+
INT_LITERAL_DEC:   /0|[1-9]\d*/i

%ignore /[\t \f]+/  // WS
%ignore /\\[\t \f]*\r?\n/   // LINE_CONT
%declare _INDENT _DEDENT
"""

# for Pythonic indentation:
class CustomIndenter(lark.indenter.Indenter):
    NL_type = '_NEWLINE'
    OPEN_PAREN_types = ['LPAR', 'LSQB', 'LBRACE']
    CLOSE_PAREN_types = ['RPAR', 'RSQB', 'RBRACE']
    INDENT_type = '_INDENT'
    DEDENT_type = '_DEDENT'
    tab_len = 8

def parser():
    return lark.Lark(
        grammar,
        debug=True,
        parser='earley',      # lalr, earley
        lexer='standard',     # in ('standard', 'contextual', 'dynamic', 'dynamic_complete') or issubclass(lexer, Lexer)
        postlex=CustomIndenter(),
        start='file_input',
        keep_all_tokens=True,
        maybe_placeholders=True,
        propagate_positions=True,
        ambiguity='resolve')  # in ('resolve', 'explicit', 'auto')

print('code:')
print(code)

p = parser()
print('parser:', p)
tree = p.parse(code)
print('tree:', tree)

print('Done.')

The exception:

Traceback (most recent call last):
  File "IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py", line 104, in <module>
    tree = p.parse(code)
  File "...\lark-github\lark\lark.py", line 494, in parse
    return self.parser.parse(text, start=start)
  File "...\lark-github\lark\parser_frontends.py", line 138, in parse
    return self._parse(start, self.make_lexer(text))
  File "...\lark-github\lark\parser_frontends.py", line 73, in _parse
    return self.parser.parse(input, start, *args)
  File "...\lark-github\lark\parsers\earley.py", line 320, in parse
    return transformer.transform(solutions[0])
  File "...\lark-github\lark\parsers\earley_forest.py", line 353, in transform
    self.visit(root)
  File "...\lark-github\lark\parsers\earley_forest.py", line 295, in visit
    vpno(current)
  File "...\lark-github\lark\parsers\earley_forest.py", line 587, in visit_packed_node_out
    super(ForestToParseTree, self).visit_packed_node_out(node)
  File "...\lark-github\lark\parsers\earley_forest.py", line 417, in visit_packed_node_out
    transformed = self.transform_packed_node(node, self.data[id(node)])
  File "...\lark-github\lark\parsers\earley_forest.py", line 570, in transform_packed_node
    return self._call_rule_func(node, children)
  File "...\lark-github\lark\parsers\earley_forest.py", line 524, in _call_rule_func
    return self.callbacks[node.rule](data)
  File "...\lark-github\lark\parse_tree_builder.py", line 29, in __call__
    res = self.node_builder(children)
  File "...\lark-github\lark\parse_tree_builder.py", line 129, in __call__
    filtered.append(children[i])
IndexError: list index out of range

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (10 by maintainers)

Commits related to this issue

Most upvoted comments

I have bisected the issue to 9db869c. I am close to a fix and will make a PR soon.

Thank you, everybody. Lark gets better and better.

Thanks, I am getting the IndexError now. I will add the test to the PR.

Here is a reduce version:

start: "a" NEWLINE INDENT "b" NEWLINE DEDENT

NEWLINE: ( /\r?\n */  )+

%ignore " "+
%declare INDENT DEDENT

for this input:

a
    b

And this postlexer

class CustomIndenter(lark.indenter.Indenter):
    NL_type = 'NEWLINE'
    OPEN_PAREN_types = []
    CLOSE_PAREN_types = []
    INDENT_type = 'INDENT'
    DEDENT_type = 'DEDENT'
    tab_len = 8

with these options:

        parser='earley',
        lexer='standard',
        postlex=CustomIndenter()

This should probably put inside of _make_parser_test and disabled for LEXER='dynamic', like the other test for the postlexer.

It is probably possible to reduce this further by writting another PostLexer, but I don’t think that is necessary for this test.

I will do some debugging to try to pinpoint the issue. If I cannot, I will use your program as a test case.