lark: Working grammar on Lark 0.9.0 throws IndexError on post-0.9.0
Describe the bug
My parser runs on Lark 0.9.0, but throws IndexError on both Lark 0.10.1 and master branch. The parser is Earley and uses the same custom indenter as for Python examples.
I reported this bug earlier in Gitter, sharing a stack dump and some local var values, but did not yet have a reduced example. Now I do:
To Reproduce
Run with python <filename>:
"""
IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py
Works in Lark 0.9, but not 0.10.1 or github master on 2020-11-12
Python 3.8.5 64-bit on Win 10 Pro
python IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py
Expected output is:
code: ...
parser: ...
tree: ...
Done.
Works on Lark 0.9.0
On Lark 0.10.1 or latest master branch, raises IndexError:
"""
import lark
import lark.indenter
code = '''\
if a is b
print(a)
'''
# improper subset of one of the Python grammar examples:
grammar = r"""
file_input: (_NEWLINE | stmt)*
?stmt: simple_stmt | compound_stmt
?simple_stmt: small_stmt (";" small_stmt)* [";"] _NEWLINE
?small_stmt: (NAME test)
?compound_stmt: if_stmt
if_stmt: "if" test ":"? suite
suite: (","? simple_stmt) | (_NEWLINE _INDENT stmt+ _DEDENT)
?test: binary_bool_test
?binary_bool_test: not_test (("and" | "or") not_test)*
?not_test: "not" not_test -> not | comparison
?comparison: expr (_comp_op expr)*
star_expr: "*" expr
?expr: shift_expr (("bor" | "band" | "bxor") shift_expr)*
?shift_expr: arith_expr (_shift_op arith_expr)*
?arith_expr: term (_add_op term)*
?term: factor (_mul_op factor)*
?factor: _factor_op factor | atom_expr
_factor_op: "+" | "-"
_add_op: "+" | "-"
_shift_op: "<<" | ">>"
_mul_op: "*" | "/" | "%" | "//"
_comp_op: "<" | ">" | "==" | ">=" | "<=" | "<>" | "in" | "is"
?atom_expr: atom_expr "(" [arguments] ")" -> func_call
| atom
?atom: "(" test ")"
| name
| number
name: NAME
?number: INT_LITERAL_DEC
?testlist_comp: (test|star_expr) [("," (test|star_expr))+ [","] | ","]
exprlist: (expr|star_expr) ("," (expr|star_expr))* [","]
testlist: test ("," test)* [","]
arguments: argvalue ("," argvalue)* ("," [starargs | kwargs])?
| starargs
| kwargs
starargs: "*" test ("," "*" test)* ("," argvalue)* ["," kwargs]
kwargs: "**" test
?argvalue: test ("=" test)?
NAME: /[a-zA-Z_]\w*/
_NEWLINE: ( /\r?\n[\t ]*/ )+
INT_LITERAL_DEC: /0|[1-9]\d*/i
%ignore /[\t \f]+/ // WS
%ignore /\\[\t \f]*\r?\n/ // LINE_CONT
%declare _INDENT _DEDENT
"""
# for Pythonic indentation:
class CustomIndenter(lark.indenter.Indenter):
NL_type = '_NEWLINE'
OPEN_PAREN_types = ['LPAR', 'LSQB', 'LBRACE']
CLOSE_PAREN_types = ['RPAR', 'RSQB', 'RBRACE']
INDENT_type = '_INDENT'
DEDENT_type = '_DEDENT'
tab_len = 8
def parser():
return lark.Lark(
grammar,
debug=True,
parser='earley', # lalr, earley
lexer='standard', # in ('standard', 'contextual', 'dynamic', 'dynamic_complete') or issubclass(lexer, Lexer)
postlex=CustomIndenter(),
start='file_input',
keep_all_tokens=True,
maybe_placeholders=True,
propagate_positions=True,
ambiguity='resolve') # in ('resolve', 'explicit', 'auto')
print('code:')
print(code)
p = parser()
print('parser:', p)
tree = p.parse(code)
print('tree:', tree)
print('Done.')
The exception:
Traceback (most recent call last):
File "IndexError_ChildFilterLALR_NoPlaceholders_2020-11.py", line 104, in <module>
tree = p.parse(code)
File "...\lark-github\lark\lark.py", line 494, in parse
return self.parser.parse(text, start=start)
File "...\lark-github\lark\parser_frontends.py", line 138, in parse
return self._parse(start, self.make_lexer(text))
File "...\lark-github\lark\parser_frontends.py", line 73, in _parse
return self.parser.parse(input, start, *args)
File "...\lark-github\lark\parsers\earley.py", line 320, in parse
return transformer.transform(solutions[0])
File "...\lark-github\lark\parsers\earley_forest.py", line 353, in transform
self.visit(root)
File "...\lark-github\lark\parsers\earley_forest.py", line 295, in visit
vpno(current)
File "...\lark-github\lark\parsers\earley_forest.py", line 587, in visit_packed_node_out
super(ForestToParseTree, self).visit_packed_node_out(node)
File "...\lark-github\lark\parsers\earley_forest.py", line 417, in visit_packed_node_out
transformed = self.transform_packed_node(node, self.data[id(node)])
File "...\lark-github\lark\parsers\earley_forest.py", line 570, in transform_packed_node
return self._call_rule_func(node, children)
File "...\lark-github\lark\parsers\earley_forest.py", line 524, in _call_rule_func
return self.callbacks[node.rule](data)
File "...\lark-github\lark\parse_tree_builder.py", line 29, in __call__
res = self.node_builder(children)
File "...\lark-github\lark\parse_tree_builder.py", line 129, in __call__
filtered.append(children[i])
IndexError: list index out of range
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (10 by maintainers)
Commits related to this issue
- Fix IndexError (issue #754) — committed to chanicpanic/lark by chanicpanic 4 years ago
I have bisected the issue to 9db869c. I am close to a fix and will make a PR soon.
Thank you, everybody. Lark gets better and better.
Thanks, I am getting the IndexError now. I will add the test to the PR.
Here is a reduce version:
for this input:
And this postlexer
with these options:
This should probably put inside of
_make_parser_testand disabled forLEXER='dynamic', like the other test for the postlexer.It is probably possible to reduce this further by writting another PostLexer, but I don’t think that is necessary for this test.
I will do some debugging to try to pinpoint the issue. If I cannot, I will use your program as a test case.