Recursive Descent precedence parsing - matching lo

2019-07-24 19:40发布

Note: this is a more detailed version of Recursive Descent precedence parsing missing prefix expression

I'm building a simple language parser, and having an issue with lower precedence prefix expressions. Here's an example grammar:

E = E8
E8 = E7 'OR' E8 | E7
E7 = E6 'XOR' E7 | E6
E6 = E5 'AND' E6 | E5
E5 = 'NOT' E5 | E4
E4 = E3 '==' E4 | E3 '!=' E4 | E3
E3 = E2 '<' E3 | E2 '>' E3 | E2
E2 = E1 '+' E2 | E1 '-' E2 | E1 '*' E2 | E1 '+' E2 | E1
E1 = '(' E ')' | 'true' | 'false' | '0'..'9'

However, this grammar doesn't work correctly for the NOT, if it's used as the RHS of a higher precedence infix operator, i.e.:

true == NOT false

This is due to the == operator requiring E3 on the RHS, which cannot be a 'NOT' operation.

I'm unsure the correct way to express this grammar? Is it still possible using this simplistic recursive descent approach, or will I need to move to a more featured algorithm (shunting yard or precedence climbing).

Here are some examples that would need to parse correctly:

  • input true == 1 < 2, output ==(true, <(1, 2))
  • input 1 < 2 == true, output ==(<(1, 2), true)
  • input NOT true == false, output NOT(==(true, false))
  • input true == NOT false, output ==(true, NOT(false)) ** doesn't work
  • input true < NOT false, output <(true, NOT(false)) ** doesn't work

I have attempted to alter the levels E4, E3, and E2 to use E5 on the RHS of the infix expression, as suggested in Recursive Descent precedence parsing missing prefix expression (i.e. E3 '==' E5, E3 '<' E5, etc). However this breaks the precedence between these levels, i.e. true == 1 < 2 would be incorrectly parsed as<(==(true, 1), 2)`.

1条回答
女痞
2楼-- · 2019-07-24 20:12

When sticking to the way your language is defined, you cannot have

true == NOT false 

as a valid term in your language. Because then

NOT false == true

would be ambigous: the parse tree could be either

    NOT
     | 
    ==
   /  \
false true

or

   ==
  /  \
 NOT true
  |
false

Note that

true == NOT (false)

is a valid term in your language. A probably more intuitive definition of your language would be to put the NOT-level from E5 down to E2. Then

true == NOT false 
NOT false == true

are both valid and NOT binds with false. And the alternative meaning of the second expression would be expressed as

NOT (false == true)

If these options still do not satisfy you, you have to change/extend the tool. E.g. the yacc/bison parser allows to explicitely define operator precedences; see e.g. here

查看更多
登录 后发表回答