I am teaching myself to use JavaCC in a hobby project, and have a simple grammar to write a parser for. Part of the parser includes the following:
TOKEN : { < DIGIT : (["0"-"9"]) > }
TOKEN : { < INTEGER : (<DIGIT>)+ > }
TOKEN : { < INTEGER_PAIR : (<INTEGER>){2} > }
TOKEN : { < FLOAT : (<NEGATE>)? <INTEGER> | (<NEGATE>)? <INTEGER> "." <INTEGER> | (<NEGATE>)? <INTEGER> "." | (<NEGATE>)? "." <INTEGER> > }
TOKEN : { < FLOAT_PAIR : (<FLOAT>){2} > }
TOKEN : { < NUMBER_PAIR : <FLOAT_PAIR> | <INTEGER_PAIR> > }
TOKEN : { < NEGATE : "-" > }
When compiling with JavaCC I get the output:
Warning: Regular Expression choice : FLOAT_PAIR can never be matched as : NUMBER_PAIR
Warning: Regular Expression choice : INTEGER_PAIR can never be matched as : NUMBER_PAIR
I'm sure this is a simple concept but I don't understand the warning, being a novice in both parser generation and regular expressions.
What does this warning mean (in as-novice-as-you-can-get terms)?
Thanks to Barry Kelly's answer, the solution I've come up with is:
I had completely forgot to include the space which is used to separate the two tokens, I've also used the '#' symbol which stops the tokens being matched, and is just used in the definition of other tokens. The above is compiled by JavaCC without warning or error.
However, as noted by Barry, there are reasons against doing this.
I haven't used JavaCC, but it is possible that NUMBER_PAIR is ambiguous.
I think the problem comes down to the fact that the same exact thing can be matched as both FLOAT_PAIR and INTEGER_PAIR since FLOAT can match an INTEGER.
But this is just a guess having never seen the JavaCC syntax :)
It probably means that for every
FLOAT_PAIR
you'll just get aFLOAT_PAIR
token, never aNUMBER_PAIR
token. TheFLOAT_PAIR
rule already matches all the input and JavaCC will not try to find further matching rules. That would be my interpretation, but I don't know JavaCC, so take it with a grain of salt.Maybe you can specify somehow that
NUMBER_PAIR
is the main production and that you don't want to get any other tokens as results.I don't know JavaCC, but I am a compiler engineer.
The
FLOAT_PAIR
rule is ambiguous. Consider the following text:This could be
FLOAT 0
followed byFLOAT .0
; or it could beFLOAT 0.
followed byFLOAT 0
; both resulting in FLOAT_PAIR. Or it could be a single FLOAT0.0
.More importantly, though, you are using lexical analysis with composition in a way that is never likely to work. Consider this number:
This could be parsed as
INTEGER 12, INTEGER 345
resulting in anINTEGER_PAIR
. Or it could be parsed asINTEGER 123, INTEGER 45
, anotherINTEGER_PAIR
. Or it could beINTEGER 12345
, another token. The problem exists because you are not requiring white space between the lexical elements of theINTEGER_PAIR
(orFLOAT_PAIR
).You should almost never try to handle pairs like this in the lexer. Instead, you should handle plain numbers (
INTEGER
andFLOAT
) as tokens, and handle things like negation and pairing in the parser, where whitespace has been dealt with and stripped.(For example, how are you going to process
"----42"
? This is a valid expression in most programming languages, which will correctly calculate multiple negations, but would not be handled by your lexer.)Also, be aware that single-digit integers in your lexer will not be matched as
INTEGER
, they will come out asDIGIT
. I don't know the correct syntax for JavaCC to fix that for you, though. What you want is to defineDIGIT
not as a token, but simply something you can use in the definitions of other tokens; alternatively, embed the definition ofDIGIT
([0-9]
) directly wherever you are usingDIGIT
in your rules.