I'm building a simple language parser, and having an issue with lower precedence prefix expressions. Here's an example grammar:
E = E5
E5 = E4 'OR' E4 | E4
E4 = E3 'AND' E3 | E3
E3 = 'NOT' E3 | E2
E2 = E1 '==' E1 | E1
E1 = '(' E ')' | 'true' | 'false'
However, this grammar doesn't work correctly for the NOT
, if it's used as the RHS of a higher precedence infix operator, i.e.:
true == NOT false
This is due to the ==
operator requiring E1 on the RHS, which cannot be a NOT operation.
I'm unsure the correct way to express this grammar? Is it still possible using this simplistic recursive descent approach, or will I need to move to a more featured algorithm (shunting yard or precedence climbing).
Assuming the following input and expected parses are correct:
- test 1
- input:
true == NOT false
- output:
(true == (NOT false))
- test 2
- input:
NOT true == false
- output:
(NOT (true == false))
- test 3
- input:
NOT true == NOT false
- output:
(NOT (true == (NOT false)))
Here's an (ANTLR4) grammar that does the trick:
grammar Expr;
e : e5;
e5 : e4 'OR' e5 | e4;
e4 : e3 'AND' e4 | e3;
e3 : 'NOT' e3 | e2;
e2 : e1 '==' e3 | e1;
e1 : '(' e ')' | 'true' | 'false';
S : [ \t\r\n] -> skip;
Parses ANTLR created:
1
2
3
Your language is also (unnecessarily) ambiguous. Fixing that helps you fix this problem, too.
Here, D
is shorthand for "disjunction", C
for conjunction, N
for negation, and P
for primary, E
for equality.
D = C | C 'OR' D
C = N | N 'AND' C
N = E | 'NOT' N
E = P | P '==' P
P = '(' E ')' | 'true' | 'false'
Maybe use polish notation?
E = E5
E5 = 'OR' E4 E4 | E4
E4 = 'AND' E3 E3 | E3
E3 = 'NOT' E3 | E2
E2 = '==' E1 E1 | E1
E1 = '(' E ')' | 'true' | 'false'