antlr4: Single-token deletion error node not in parse tree

Hi there,

When single-token deletion kicks in and skips a token in a rule with alternative labels (it entered the rule but didn’t find the alternative yet) the error node that is created is not attached to the parse tree. If the extra token is present after a valid rule alternative is found then the error node representing the token is correctly present in the parse tree.

This is due to the ParseRuleContext.CopyFrom method not copying children (at least error nodes). See the comments on the generated code:

private VariableExprContext variableExpr(int _p) throws RecognitionException {
   ParserRuleContext _parentctx = _ctx;
   int _parentState = getState();
   VariableExprContext _localctx = new VariableExprContext(_ctx, _parentState); // Base class of alt rule with labels (correct alt not found yet).
   VariableExprContext _prevctx = _localctx;
   int _startState = 6;
   enterRecursionRule(_localctx, 6, RULE_variableExpr, _p);
   int _la;
   try {
      int _alt;
      enterOuterAlt(_localctx, 1);
      {
      setState(36);
      _errHandler.sync(this); // Single-token deletion skips extra token and adds an error node to the children of _localctx (VariableExprContext).
      switch ( getInterpreter().adaptivePredict(_input,3,_ctx) ) {  // A valid alternative is found.
      case 1:
         {
         _localctx = new LiteralExprContext(_localctx); // Simple fields(properties are copied here but not children (which include the sinlge-token deletion error node).
         _ctx = _localctx;
         _prevctx = _localctx;

         setState(33);
         literal();
         }
         break;

Some more context and a concrete example: We are using Antlr 4.5.3 to build a proper grammar/parser for an existing template engine. Target languages are C# and java but we have been working mainly with C# so far. We have both a lexer and parser files to use Antlr modes. We are writing and testing the grammar using the Antlr plugin for IntelliJ. Note that for this issue the IntelliJ plugin is misleading as it shows the missing error node in the tree which is not the case for the TestRig (-gui) (parse tree vs listener issue??).

In our template engine, dynamic expressions in text must be inside ${ expr } . Expressions mainly support basic operators, a few functions, identifiers and number literals. Our lexer file has a catch-all rule to match any unrecognized character and we skip (hidden channel) whitespace characters.

So expression ${abc} is valid. Expression ${ ? abc ? } with 2 single invalid ‘?’ tokens before and after a valid labeled alternative produces a parse tree without the error node for the first token. See the tree for ${ ? abc ? } : java_2016-09-29_15-46-58

Please note that to reproduce the issue with this simpler version of our grammar I had to generate it with the -xForce-atn option, otherwise it was generating LL1 code which doesn’t call errHandler.sync(this) before finding alternative (see the string template for C# extracts below for LL1 and LL). Which leads me to another issue/question: _why does LL1 generated code not do single-token deletion?*

https://github.com/antlr/antlr4/blob/master/tool/resources/org/antlr/v4/tool/templates/codegen/CSharp/CSharp.stg Line 597

// LL(*) stuff
AltBlock(choice, preamble, alts, error) ::= <<
State = <choice.stateNumber>;
ErrorHandler.Sync(this); // CALLING THE ERROR RECOVERY STRATEGY?
<if(choice.label)><labelref(choice.label)> = TokenStream.Lt(1);<endif>
<preamble; separator="\n">
switch ( Interpreter.AdaptivePredict(TokenStream,<choice.decision>,Context) ) {
<alts:{alt |
case <i>:
<alt>
break;}; separator="\n">
}
>>

Line 538

LL1AltBlock(choice, preamble, alts, error) ::= <<
State = <choice.stateNumber>; // NO CALL TO ErrorHandler.Sync(this);
<if(choice.label)><labelref(choice.label)> = TokenStream.Lt(1);<endif>
<preamble; separator="\n">
switch (TokenStream.La(1)) {
<choice.altLook,alts:{look,alt| <cases(ttypes=look)>
<alt>
break;}; separator="\n">
default:
<error>
}
>>

One more issue/question: if no rule alternative is found (e.g. with ${???} ), the node that will be created on the parse tree will be of type “alternative” base class (here VariableExprContext). But there is no overload in the generated visitor to visit this class. I understand it could be a bit confusing at first but still I think it makes sense to have it (or find a way to make this base class abstract which could solve both the visitor and missing node issues).

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 1
  • Comments: 21 (21 by maintainers)

Commits related to this issue

Most upvoted comments

hi @parrt, 4.6 doesn’t fix the tree. It makes it easier to create a simple grammar though (thanks to the sync() call in LL(1)). So here goes:

grammar TestGrammar;

r : '${' v '}' ;

v : A #altA
  | B #altB
  ;

A : 'a' ;
B : 'b' ;

WHITESPACE : [ \n\t\r]+ -> channel(HIDDEN) ;

ERROR : . ;

And test input: ${ ? a ?}

Gives correct error messages:

line 1:3 extraneous input '?' expecting {'a', 'b'}
line 1:7 extraneous input '?' expecting '}'

But the tree still doesn’t have the first ? as error node: (r ${ (v a) ? }).

See my comments inline the Antlr generated code below:

public final VContext v() throws RecognitionException {
        VContext _localctx = new VContext(_ctx, getState());
        enterRule(_localctx, 2, RULE_v);
        try {
            setState(10);
            _errHandler.sync(this); // Single-token deletion skips extra token and adds an error node to the children of _localctx (VContext).
            switch (_input.LA(1)) {
            case A:
                _localctx = new AltAContext(_localctx); // Valid alternative found. Children of initial _localctx are lost.
                enterOuterAlt(_localctx, 1);
                {
                setState(8);
                match(A);
                }
                break;
            case B:
                _localctx = new AltBContext(_localctx);
                enterOuterAlt(_localctx, 2);
                {
                setState(9);
                match(B);
                }
                break;
            default:
                throw new NoViableAltException(this);
            }
        }