antlr4: Single-token deletion error node not in parse tree
Hi there,
When single-token deletion kicks in and skips a token in a rule with alternative labels (it entered the rule but didn’t find the alternative yet) the error node that is created is not attached to the parse tree. If the extra token is present after a valid rule alternative is found then the error node representing the token is correctly present in the parse tree.
This is due to the ParseRuleContext.CopyFrom method not copying children (at least error nodes). See the comments on the generated code:
private VariableExprContext variableExpr(int _p) throws RecognitionException {
ParserRuleContext _parentctx = _ctx;
int _parentState = getState();
VariableExprContext _localctx = new VariableExprContext(_ctx, _parentState); // Base class of alt rule with labels (correct alt not found yet).
VariableExprContext _prevctx = _localctx;
int _startState = 6;
enterRecursionRule(_localctx, 6, RULE_variableExpr, _p);
int _la;
try {
int _alt;
enterOuterAlt(_localctx, 1);
{
setState(36);
_errHandler.sync(this); // Single-token deletion skips extra token and adds an error node to the children of _localctx (VariableExprContext).
switch ( getInterpreter().adaptivePredict(_input,3,_ctx) ) { // A valid alternative is found.
case 1:
{
_localctx = new LiteralExprContext(_localctx); // Simple fields(properties are copied here but not children (which include the sinlge-token deletion error node).
_ctx = _localctx;
_prevctx = _localctx;
setState(33);
literal();
}
break;
Some more context and a concrete example: We are using Antlr 4.5.3 to build a proper grammar/parser for an existing template engine. Target languages are C# and java but we have been working mainly with C# so far. We have both a lexer and parser files to use Antlr modes. We are writing and testing the grammar using the Antlr plugin for IntelliJ. Note that for this issue the IntelliJ plugin is misleading as it shows the missing error node in the tree which is not the case for the TestRig (-gui) (parse tree vs listener issue??).
In our template engine, dynamic expressions in text must be inside ${ expr } . Expressions mainly support basic operators, a few functions, identifiers and number literals. Our lexer file has a catch-all rule to match any unrecognized character and we skip (hidden channel) whitespace characters.
So expression ${abc} is valid. Expression ${ ? abc ? } with 2 single invalid ‘?’ tokens before and after a valid labeled alternative produces a parse tree without the error node for the first token. See the tree for ${ ? abc ? } :
Please note that to reproduce the issue with this simpler version of our grammar I had to generate it with the -xForce-atn option, otherwise it was generating LL1 code which doesn’t call errHandler.sync(this) before finding alternative (see the string template for C# extracts below for LL1 and LL). Which leads me to another issue/question: _why does LL1 generated code not do single-token deletion?*
// LL(*) stuff
AltBlock(choice, preamble, alts, error) ::= <<
State = <choice.stateNumber>;
ErrorHandler.Sync(this); // CALLING THE ERROR RECOVERY STRATEGY?
<if(choice.label)><labelref(choice.label)> = TokenStream.Lt(1);<endif>
<preamble; separator="\n">
switch ( Interpreter.AdaptivePredict(TokenStream,<choice.decision>,Context) ) {
<alts:{alt |
case <i>:
<alt>
break;}; separator="\n">
}
>>
Line 538
LL1AltBlock(choice, preamble, alts, error) ::= <<
State = <choice.stateNumber>; // NO CALL TO ErrorHandler.Sync(this);
<if(choice.label)><labelref(choice.label)> = TokenStream.Lt(1);<endif>
<preamble; separator="\n">
switch (TokenStream.La(1)) {
<choice.altLook,alts:{look,alt| <cases(ttypes=look)>
<alt>
break;}; separator="\n">
default:
<error>
}
>>
One more issue/question: if no rule alternative is found (e.g. with ${???} ), the node that will be created on the parse tree will be of type “alternative” base class (here VariableExprContext). But there is no overload in the generated visitor to visit this class. I understand it could be a bit confusing at first but still I think it makes sense to have it (or find a way to make this base class abstract which could solve both the visitor and missing node issues).
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 1
- Comments: 21 (21 by maintainers)
Commits related to this issue
- add unit test for #1298 — committed to parrt/antlr4 by parrt 8 years ago
- copyFrom copies error nodes over. Fixes #1298 — committed to parrt/antlr4 by parrt 8 years ago
- Implement the fix of #1298 for Python targets. The patch adapts the fix in PR #1472 for Python targets. — committed to renatahodovan/antlr4 by renatahodovan 8 years ago
- Implement the fix of #1298 for Python targets. The patch adapts the fix in PR #1472 for Python targets. — committed to renatahodovan/antlr4 by renatahodovan 8 years ago
- Merge pull request #1476 from renatahodovan/python-target-1472 Implement the fix of #1298 for Python targets. — committed to antlr/antlr4 by parrt 8 years ago
- Merge pull request #1494 from ericvergnaud/fix-#1298-for-JavaScript Fix #1298 for JavaScript — committed to antlr/antlr4 by parrt 8 years ago
- Fixes #1298 — committed to janyou/antlr4 by janyou 8 years ago
- Merge pull request #1508 from janyou/Fixes-1298 Fixes #1298 for Swift target — committed to antlr/antlr4 by parrt 8 years ago
- Fix #1298 for CSharp — committed to lionelplessis/antlr4 by lionelplessis 7 years ago
- Merge pull request #1655 from lionelplessis/fix-#1298-for-csharp Fix #1298 for CSharp — committed to antlr/antlr4 by parrt 7 years ago
hi @parrt, 4.6 doesn’t fix the tree. It makes it easier to create a simple grammar though (thanks to the
sync()
call in LL(1)). So here goes:And test input:
${ ? a ?}
Gives correct error messages:
But the tree still doesn’t have the first ? as error node:
(r ${ (v a) ? })
.See my comments inline the Antlr generated code below: