ANTLR4 not reporting ambiguity

2019-07-13 05:24发布

问题:

Given the following grammar:

grammar ReportAmbiguity;

unit : statements+;

statements : 
        callStatement+
        // '.'         // <- uncomment this line
    ;

callStatement : 'CALL' ID (argsByRef | argsByVal)*;

argsByRef : ('BY' 'REF')? ID+;

argsByVal : 'BY' 'VAL' ID+;

ID : ('A'..'Z')+;

WS : (' '|'\n')+ -> channel(HIDDEN);

When parsing the string "CALL FUNCTION BY VAL A B" through the non-root rule callStatement everything works and the parser correctly reports an ambiguity:

line 1:24 reportAttemptingFullContext d=6 (argsByVal), input='B'
line 1:24 reportAmbiguity d=6 (argsByVal): ambigAlts={1, 2}, input='B'

Parser correcly outputs the tree: (callStatement CALL FUNCTION (argsByVal BY VAL A B)).

Now consider uncommenting the line shown above (the 7th). Testing everything again.

The parser still outputs the same tree, but the ambiguity reports are gone. Why this obviously ambiguous grammar with such an ambiguous input is not being reported anymore?

(This is part of a bigger problem. I'm trying to understand this so I can pin down another possible problem with my grammar.)

EDIT 1

Using antlr4 version 4.6.

I've prepared a pet project in github: https://github.com/rslemos/pet-grammars (in module g, type mvn clean test -Dtest=br.eti.rslemos.petgrammars.ReportAmbiguityUnitTest to have the commented version tested; uncomment the 7th line and run it again to see it failing).

EDIT 2

Changed unit: statements*; to unit: statements+;. This change itself changes nothing to the original problem. It only allows another experience (further edition pending).

EDIT 3

Another way to trigger this bug is to change unit: statements+; to unit: statements+ unit;.

Like when adding '.' to statements, this change also makes antlr4 forgo ambiguity detection.

I think this has something to do with an EOF that possibly follows argsByVal.

The first alternative (append '.' to statements) precludes EOF from appearing just after argsByVal.

The second one (append unit to itself) makes it a non-root rule (and it seems that antlr implicitly appends EOF to every root rule).

I always thought antlr4 rules were meant to be invoked anyway we liked, with no rule given some special treatment, the root rule being so called just because we (grammar author) know which rule is the root.

EDIT 4

Could be related to https://github.com/antlr/antlr4/issues/1545.

标签: antlr4