First I tried to identify a normal word and below works fine:
grammar Test;
myToken: WORD;
WORD: (LOWERCASE | UPPERCASE )+ ;
fragment LOWERCASE : [a-z] ;
fragment UPPERCASE : [A-Z] ;
fragment DIGIT: '0'..'9' ;
WHITESPACE : (' ' | '\t')+;
Just when I added below parser rule just beneath "myToken", even my WORD tokens weren't getting recognised with input string as "abc"
ALPHA_NUMERIC_WS: ( WORD | DIGIT | WHITESPACE)+;
Does anyone have any idea why is that?
This is because ANTLR's lexer matches "first come, first serve". That means it will tray to match the given input with the first specified (in the source code) rule and if that one can match the input, it won't try to match it with the other ones.
In your case
ALPHA_NUMERIC_WS
does match the same content asWORD
(and more) and because it is specified beforeWORD
,WORD
will never be used to match the input as there is no input that can be matched byWORD
that can't be matched by the first processedALPHA_NUMERIC_WS
. (The same applies for theWS
and theDIGIT
) rule.I guess that what you want is not to create a
ALPHA_NUMERIC_WS
-token (as is done by specifying it as a lexer rule) but to make it a parser rule instead so it then can be referenced from another parsre rule to allow an arbitrary sequence ofWORD
s,DIGIT
s andWS
s.Therefore you'd want to write it like this:
If you actually want to create the respective token you can either remove the following rules or you need to think about what a lexer's job is and where to draw the line between lexer and parser (You need to redesign your grammar in order for this to work).