lark: Bug in handling ambiguity?
When running this code:
grammar = """
expression: "c" | "d" | "c" "d"
unit: expression "a"
| "a" expression
| "b" unit
| "b" expression
start: unit*
%import common.WS
%ignore WS
"""
l = Lark(grammar, parser='earley', ambiguity='explicit')
print(l.parse('b c d a a c').pretty())
It is expected to have an ambiguous parse, but there is no ‘_ambig’ node.
At least these options are valid:
unit(
b
unit(
expression(
c
d
)
a
)
)
unit(
a
expression(
c
)
)
and this parse:
unit(
b
expression(
c
)
)
unit(
expression(
d
)
a
)
unit(
a
expression(
c
)
)
The only parse that comes back is the second one.
When one removes the "b" expression
option, you get the first one.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 67 (31 by maintainers)
Commits related to this issue
- Bugfix #21: Can now handle recursive ambiguity while still defending against infinite recursion — committed to lark-parser/lark by erezsh 7 years ago
- BUGFIX: Tokens of different type were equal, causing disambiguation errors (Issue #21) — committed to lark-parser/lark by erezsh 7 years ago
Fixed! At the very least, the example you gave me now runs almost instantly. Please check if latest master fixes your issues.
Turns out it was a very simple bug, but it was a real flaw in my implementation. So thank you for reporting it and helping me weed it out.
See if you can find an example of name length affecting speed. I’ll be very happy to fix it if it’s true.
In
master
.Okay, all my tests are passing now. I pushed the most recent changes to the branch: earley_fix2
If you still have issues with the priority, an example would really help.
I’ll see if I can explain it in simple words. The slowdown was caused by similar parse-items that had different trees. Every different possible derivation was added as a separate parse-item, that would later merge as ambiguity. This created thousands of parse-items at every step, and caused Earley to choke. I made a very subtle change: I merged the similar parse-items into a single parse-item, and added their trees as ambiguity earlier in the process.
You could think of this as an optimization, but really, the previous implementation was just wrong. I still need to solve some subtleties that resulted from this change, and then I plan to merge it into master.
Anyway, I’m glad it solved the issue for you!
wow X20-30 faster (similar to what it was before) almost no diffs I’m happy with this version
i’m curious as to what you did tho…