lark: Bug in handling ambiguity?

When running this code:

grammar = """
expression: "c" | "d" | "c" "d"
unit: expression "a"
    | "a" expression
    | "b" unit
    | "b" expression
start: unit*

%import common.WS
%ignore WS
"""

l = Lark(grammar, parser='earley', ambiguity='explicit')
print(l.parse('b c d a a c').pretty())

It is expected to have an ambiguous parse, but there is no ‘_ambig’ node.

At least these options are valid:

unit(
    b
    unit(
        expression(
            c
            d
        )
        a
    )
)
unit(
    a
    expression(
        c
    )
)

and this parse:

unit(
    b
    expression(
        c
    )
)
unit(
    expression(
        d
    )
    a
)
unit(
    a
    expression(
        c
    )
)

The only parse that comes back is the second one. When one removes the "b" expression option, you get the first one.

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 67 (31 by maintainers)

Commits related to this issue

Bugfix #21: Can now handle recursive ambiguity while still defending against infinite recursion — committed to lark-parser/lark by erezsh 7 years ago
BUGFIX: Tokens of different type were equal, causing disambiguation errors (Issue #21) — committed to lark-parser/lark by erezsh 7 years ago

Most upvoted comments

Fixed! At the very least, the example you gave me now runs almost instantly. Please check if latest master fixes your issues.

Turns out it was a very simple bug, but it was a real flaw in my implementation. So thank you for reporting it and helping me weed it out.

See if you can find an example of name length affecting speed. I’ll be very happy to fix it if it’s true.

erezsh on Oct 29, 2017

CAN DO

In master.

erezsh on Dec 11, 2017

Okay, all my tests are passing now. I pushed the most recent changes to the branch: earley_fix2

If you still have issues with the priority, an example would really help.

erezsh on Oct 31, 2017

I’ll see if I can explain it in simple words. The slowdown was caused by similar parse-items that had different trees. Every different possible derivation was added as a separate parse-item, that would later merge as ambiguity. This created thousands of parse-items at every step, and caused Earley to choke. I made a very subtle change: I merged the similar parse-items into a single parse-item, and added their trees as ambiguity earlier in the process.

You could think of this as an optimization, but really, the previous implementation was just wrong. I still need to solve some subtleties that resulted from this change, and then I plan to merge it into master.

Anyway, I’m glad it solved the issue for you!

erezsh on Oct 31, 2017

wow X20-30 faster (similar to what it was before) almost no diffs I’m happy with this version

i’m curious as to what you did tho…

uriva on Oct 31, 2017