Explanation and solution for JavaCC's warning

I am teaching myself to use JavaCC in a hobby project, and have a simple grammar to write a parser for. Part of the parser includes the following:

TOKEN : { < DIGIT : (["0"-"9"]) > }
TOKEN : { < INTEGER : (<DIGIT>)+ > }
TOKEN : { < INTEGER_PAIR : (<INTEGER>){2} > }
TOKEN : { < FLOAT : (<NEGATE>)? <INTEGER> | (<NEGATE>)? <INTEGER>  "." <INTEGER>  | (<NEGATE>)? <INTEGER> "." | (<NEGATE>)? "." <INTEGER> > } 
TOKEN : { < FLOAT_PAIR : (<FLOAT>){2} > }
TOKEN : { < NUMBER_PAIR : <FLOAT_PAIR> | <INTEGER_PAIR> > }
TOKEN : { < NEGATE : "-" > }

When compiling with JavaCC I get the output:

Warning: Regular Expression choice : FLOAT_PAIR can never be matched as : NUMBER_PAIR

Warning: Regular Expression choice : INTEGER_PAIR can never be matched as : NUMBER_PAIR

I'm sure this is a simple concept but I don't understand the warning, being a novice in both parser generation and regular expressions.

What does this warning mean (in as-novice-as-you-can-get terms)?

标签： regex parsing javacc

4条回答

姐就是有狂的资本

2楼-- · 2019-06-21 04:58

Thanks to Barry Kelly's answer, the solution I've come up with is:

    SKIP : { < #TO_SKIP : " " | "\t" > }
    TOKEN : { < #DIGIT : (["0"-"9"]) > }
    TOKEN : { < #DIGITS : (<DIGIT>)+ > }
    TOKEN : { < INTEGER : <DIGITS> > }
    TOKEN : { < INTEGER_PAIR : (<INTEGER>) (<TO_SKIP>)+ (<INTEGER>) > }
    TOKEN : { < FLOAT : (<NEGATE>)?<DIGITS>"."<DIGITS> | (<NEGATE>)?"."<DIGITS> > } 
    TOKEN : { < FLOAT_PAIR : (<FLOAT>) (<TO_SKIP>)+ (<FLOAT>) > }
    TOKEN : { < #NUMBER : <FLOAT> | <INTEGER> > }
    TOKEN : { < NUMBER_PAIR : (<NUMBER>) (<TO_SKIP>)+ (<NUMBER>) >}
    TOKEN : { < NEGATE : "-" > }

I had completely forgot to include the space which is used to separate the two tokens, I've also used the '#' symbol which stops the tokens being matched, and is just used in the definition of other tokens. The above is compiled by JavaCC without warning or error.

However, as noted by Barry, there are reasons against doing this.

0人赞添加讨论(0) 举报

叛逆

3楼-- · 2019-06-21 05:04

I haven't used JavaCC, but it is possible that NUMBER_PAIR is ambiguous.

I think the problem comes down to the fact that the same exact thing can be matched as both FLOAT_PAIR and INTEGER_PAIR since FLOAT can match an INTEGER.

But this is just a guess having never seen the JavaCC syntax :)

0人赞添加讨论(0) 举报

我想做一个坏孩纸

4楼-- · 2019-06-21 05:17

It probably means that for every FLOAT_PAIR you'll just get a FLOAT_PAIR token, never a NUMBER_PAIR token. The FLOAT_PAIR rule already matches all the input and JavaCC will not try to find further matching rules. That would be my interpretation, but I don't know JavaCC, so take it with a grain of salt.

Maybe you can specify somehow that NUMBER_PAIR is the main production and that you don't want to get any other tokens as results.

0人赞添加讨论(0) 举报

Viruses.

5楼-- · 2019-06-21 05:25

I don't know JavaCC, but I am a compiler engineer.

The FLOAT_PAIR rule is ambiguous. Consider the following text:

0.0

This could be FLOAT 0 followed by FLOAT .0; or it could be FLOAT 0. followed by FLOAT 0; both resulting in FLOAT_PAIR. Or it could be a single FLOAT 0.0.

More importantly, though, you are using lexical analysis with composition in a way that is never likely to work. Consider this number:

This could be parsed as INTEGER 12, INTEGER 345 resulting in an INTEGER_PAIR. Or it could be parsed as INTEGER 123, INTEGER 45, another INTEGER_PAIR. Or it could be INTEGER 12345, another token. The problem exists because you are not requiring white space between the lexical elements of the INTEGER_PAIR (or FLOAT_PAIR).

You should almost never try to handle pairs like this in the lexer. Instead, you should handle plain numbers (INTEGER and FLOAT) as tokens, and handle things like negation and pairing in the parser, where whitespace has been dealt with and stripped.

(For example, how are you going to process "----42"? This is a valid expression in most programming languages, which will correctly calculate multiple negations, but would not be handled by your lexer.)

Also, be aware that single-digit integers in your lexer will not be matched as INTEGER, they will come out as DIGIT. I don't know the correct syntax for JavaCC to fix that for you, though. What you want is to define DIGIT not as a token, but simply something you can use in the definitions of other tokens; alternatively, embed the definition of DIGIT ([0-9]) directly wherever you are using DIGIT in your rules.

0人赞添加讨论(0) 举报

Explanation and solution for JavaCC's warning

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间