Overlapping rules - mismatched input

2019-09-19 07:26发布

问题:

My grammar (as follows (trimmed down from the original)) requires somewhat overlapping rules

grammar NOVIANum;

statement :  (priorityStatement | integerStatement)* ;

priorityStatement : T_PRIO TwoDigits ;

integerStatement : T_INTEGER Integer ;

WS : [ \t\r\n]+ -> skip ;

T_PRIO : 'PRIO' ;
T_INTEGER : 'INTEGER' ;

Integer: OneToNine Digit*  |  ZERO  ;

TwoDigits : Digit Digit ;

fragment OneToNine : ('1'..'9') ;

fragment Digit: ('0'..'9');

ZERO : [0] ;

so "Integer" and "TwoDigits" overlap to a certain extent.

The following input

INTEGER 10
PRIO 10

results in

line 2:5 mismatched input '10' expecting TwoDigits

when Integer precedes TwoDigits and in

line 1:8 mismatched input '10' expecting Integer

when TwoDigits precedes Integer in the grammar.

Is there a way around this ?

Thanks - Alex

Edit:

Thanks @GRosenberg, your suggestion, of course, worked for this small example, but when I integrated this into my full grammar it led to different mismatched input errors sure enough.

The reason being another lexer rule which requires a range of '[1-4]', so I thought I'll be clever and turn it into

grammar NOVIANum;

statement :  (priorityT | integerT | levelT )* ;

priorityT : T_PRIO twoDigits ;

integerT : T_INTEGER integer ;

levelT : T_LEVEL levelNumber  ;

levelNumber : ( ZERO DIGIT ) | ( OneToFour (ZERO | DIGIT) ) ;

integer: ZERO*  ( DIGIT ( DIGIT | ZERO )* ) ;

twoDigits : (ZERO | DIGIT) ( ZERO | DIGIT ) ;

oneToFour : OneToFour (DIGIT | ZERO) ;

WS : [ \t\r\n]+ -> skip ;

T_INTEGER : 'INTEGER' ;
T_LEVEL   : 'LEVEL' ;
T_PRIO    : 'PRIO' ;

DIGIT: OneToFour | FiveToNine ;

ZERO : '0' ;

OneToFour  : [1-4] ;
FiveToNine : [5-9] ;

This still works for the previous inputs but ...

INTEGER 350
PRIO 10
LEVEL 01
LEVEL 05
LEVEL 10
LEVEL 49

results in

[@0,0:6='INTEGER',<2>,1:0]
[@1,8:8='3',<5>,1:8]
[@2,9:9='5',<5>,1:9]
[@3,10:10='0',<6>,1:10]
[@4,12:15='PRIO',<4>,2:0]
[@5,17:17='1',<5>,2:5]
[@6,18:18='0',<6>,2:6]
[@7,20:24='LEVEL',<3>,3:0]
[@8,26:26='0',<6>,3:6]
[@9,27:27='1',<5>,3:7]
[@10,29:33='LEVEL',<3>,4:0]
[@11,35:35='0',<6>,4:6]
[@12,36:36='5',<5>,4:7]
[@13,38:42='LEVEL',<3>,5:0]
[@14,44:44='1',<5>,5:6]
[@15,45:45='0',<6>,5:7]
[@16,47:51='LEVEL',<3>,6:0]
[@17,53:53='4',<5>,6:6]
[@18,54:54='9',<5>,6:7]
[@19,55:54='<EOF>',<-1>,6:8]
line 5:6 no viable alternative at input '1'
line 6:6 no viable alternative at input '4'
(statement (integerT INTEGER (integer 3 5 0)) (priorityT PRIO (twoDigits 1 0)) (levelT LEVEL (levelNumber 0 1)) (levelT LEVEL (levelNumber 0 5)) (levelT LEVEL (levelNumber 1 0)) (levelT LEVEL (levelNumber 4 9)))

What am I missing here ?

Edit 2:

Ok, answering my own question here, of course

DIGIT: OneToFour | FiveToNine ;

kicks in where it shouldn't, even in this combined form, so about the only way to get around this - I can think of - would be

grammar NOVIANum;

statement :  (priorityT | integerT | levelT )* ;

priorityT : T_PRIO twoDigits ;

integerT : T_INTEGER integer ;

levelT : T_LEVEL levelNumber  ;

levelNumber : ( ZERO (OneToFour | FiveToNine) | ( OneToFour (ZERO | (OneToFour | FiveToNine)) ) ) ;

integer: ZERO*  ( (OneToFour | FiveToNine) ( (OneToFour | FiveToNine) | ZERO )* ) ;

twoDigits : (ZERO | (OneToFour | FiveToNine)) ( ZERO | (OneToFour | FiveToNine) ) ;

WS : [ \t\r\n]+ -> skip ;

T_INTEGER : 'INTEGER' ;
T_LEVEL   : 'LEVEL' ;
T_PRIO    : 'PRIO' ;

// DIGIT: OneToFour | FiveToNine;

ZERO : '0' ;

OneToFour  : [1-4] ;
FiveToNine : [5-9] ;

because when I create a parser rule for it like

oneToNine : OneToFour | FiveToNine ;

it'll give me this

integerT INTEGER (integer (oneToNine 3) (oneToNine 5) 0))

which is ugly and harder to handle than just

(integerT INTEGER (integer 3 5 0))

回答1:

As an general issue of design, always try to work with distinguishing elements and their objects (T_PRIO -> TwoDigits) at the same level, parser or lexer. Presuming the semantic nature of the Integer and TwoDigits rules is important, promote them to the parser and let the lexer only produce digits. That is, don't over-constrain the lexer.

In the parser, you can let the integer rule functionally hide the twoDigits rule except in the evaluation of the priorityStatement rule:

priorityStatement : T_PRIO twoDigits ;

integerStatement : T_INTEGER integer ;

integer: ZERO | ( DIGIT ( DIGIT | ZERO )* ) ;

twoDigits : DIGIT DIGIT ;

T_PRIO : 'PRIO' ;
T_INTEGER : 'INTEGER' ;
DIGIT : [1-9] ;
ZERO : '0' ;


标签: antlr4 lexer