Antlr4 grammar ambiguity

2019-09-04 18:30发布

I have the following grammar ( minimized for SO)

grammar Hello;

odataIdentifier  : identifierLeadingCharacter identifierCharacter*; 


identifierLeadingCharacter : Alpha| UNDERSCORE;
identifierCharacter :  identifierLeadingCharacter | Digit;
identifierUnreserved    : identifierCharacter | (MINUS | DOT | TILDE);

Digit  : ZERO_TO_FIVE |[6-9];

ONEHUNDRED_TO_ONEHUNDREDNINETYNINE : '1' Digit Digit;            // 100-199
TWOHUNDRED_TO_TWOHUNDREDFOURTYNINE : '2' ZERO_TO_FOUR Digit;     // 200-249
TWOHUNDREDFIFTY_TO_TWOHUNDREDFIFTYFIVE : '25' ZERO_TO_FIVE;      // 250-255
TEN_TO_NINETYNINE : ONE_TO_NINE Digit;                           // 10-99

ZERO_TO_ONE : [0-1];
ZERO_TO_TWO : ZERO_TO_ONE | [2];
ZERO_TO_THREE : ZERO_TO_TWO | [3];
ZERO_TO_FOUR : ZERO_TO_THREE | [4];
ZERO_TO_FIVE : ZERO_TO_FOUR | [5];

ONE_TO_TWO  : [1-2];
ONE_TO_THREE  : ONE_TO_TWO | [3];
ONE_TO_FOUR  : ONE_TO_THREE | [4]; 
ONE_TO_NINE  : ONE_TO_FOUR | [5-9]; 

Alpha  : [a-zA-Z];

MINUS : [-];
DOT : '.';
UNDERSCORE : '_';
TILDE : '~';

WS  :  (' '|'\r'|'\t'|'\u000C'|'\n') -> skip
    ;

for input c9 it works fine, but when i have 2 digits for example c10 it says:

extraneous input '92' expecting {<EOF>, Digit, Alpha, '_'}

so i guess it parses 9 and parses 2 and doesn't know if this should be TEN_TO_NINETYNINE or 2 Digit Digit. i am a noob to this, so wondering if my analysis is right and how could i alleviate this ...

标签： antlr4

1条回答

等我变得足够好

2楼-- · 2019-09-04 19:19

Your input is resulting in an Alpha token followed by a TEN_TO_NINETYNINE token. While the parser rule identifierLeadingCharacter does allow the Alpha token, the identifierCharacter rule cannot match a TEN_TO_NINETYNINE token.

The input 10 will always produce a TEN_TO_NINETYNINE token rather than two Digit tokens, because the former matches more of the input and lexer rules are greedy.

0人赞添加讨论(0) 举报

Antlr4 grammar ambiguity

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间