If you look at the ObjectiveC antlr v3 grammars (http://www.antlr3.org/grammar/1212699960054/ObjectiveC2ansi.g), and many of the other popular grammars out there they do a similar structure to this for solving conditionals
conditional_expression : logical_or_expression
('?' logical_or_expression ':' logical_or_expression)? ;
constant_expression : conditional_expression ;
logical_or_expression : logical_and_expression
('||' logical_and_expression)* ;
logical_and_expression : inclusive_or_expression
('&&' inclusive_or_expression)* ;
inclusive_or_expression : exclusive_or_expression
('|' exclusive_or_expression)* ;
exclusive_or_expression : and_expression ('^' and_expression)* ;
and_expression : equality_expression ('&' equality_expression)* ;
equality_expression : relational_expression
(('!=' | '==') relational_expression)* ;
relational_expression : shift_expression
(('<' | '>' | '<=' | '>=') shift_expression)* ;
shift_expression : additive_expression (('<<' | '>>') additive_expression)* ;
additive_expression : multiplicative_expression
(('+' | '-') multiplicative_expression)* ;
multiplicative_expression : cast_expression
(('*' | '/' | '%') cast_expression)* ;
cast_expression : '(' type_name ')' cast_expression | unary_expression ;
unary_expression
: postfix_expression
| '++' unary_expression
| '--' unary_expression
| unary_operator cast_expression
| 'sizeof' ('(' type_name ')' | unary_expression) ;
unary_operator : '&' | '*' | '-' | '~' | '!' ;
If you read it you'll notice they do this very long 1 to 1 chain of conditionals from conditional_expression
to logical_or_expression
to logical_and_expression
to inclusive_or_expression
to exclusive_or_expression
.
Now, I am quite naive when it comes to ANTLR but this strikes me as an odd way to parse conditionals. It seems very complicated for the definition of a logical_or_expression to twist through every single other conditional expression type. Afterall, what does the definition of a logical OR
have to do with a left bitwise shift?
Is there perhaps a better way, or is there a specific reason this method is required?
As already mentioned, the "chain" is needed to properly handle operator precedence. Without it, input like 1+2*3
would be parsed as:
*
/ \
+ 3
/ \
1 2
instead of:
+
/ \
1 *
/ \
2 3
Since ANTLR 4 supports direct left recursive rules:
foo
: foo '?' foo
| TOKEN
;
so not indirect left recursive rules:
foo
: bar
| TOKEN
;
bar
: foo '?' foo
;
You can rewrite these rules as follows:
expression
: '-' expression
| '(' type_name ')' expression
| expression ('*' | '/' | '%') expression
| expression ('+' | '-') expression
| expression ('<<' | '>>') expression
| expression ('<' | '>' | '<=' | '>=') expression
| expression ('!=' | '==') expression
| expression '&' expression
| expression '^' expression
| expression '|' expression
| expression '&&' expression
| expression '||' expression
| expression '?' expression ':' expression
| IDENTIFIER
| NUMBER
;
If the parser now stumbles upon an expression
, it will first look for ('*' | '/' | '%')
, and if that's not there, it will look for ('+' | '-')
, etc. In other words, the alternatives placed first in the rule will get precedence over alternatives placed lower in the rule.
Now I know from your earlier question, Once grammar is complete, what's the best way to walk an ANTLR v4 tree?, that you're using a listener to "walk" the tree. If you create an expression
rule as I just showed, you'd need to do a lot of manual inspections in your enterExpression(...)
and exitExpression(...)
methods to find out which of the alternatives matched an expression
. This is where "labels" come in handy. You simply label each alternative in your expression
rule:
expression
: '-' expression #unaryExpr
| '(' type_name ')' expression #castExpr
| expression ('*' | '/' | '%') expression #multExpr
| expression ('+' | '-') expression #addExpr
| expression ('<<' | '>>') expression #...
| expression ('<' | '>' | '<=' | '>=') expression
| expression ('!=' | '==') expression
| expression '&' expression
| expression '^' expression
| expression '|' expression
| expression '&&' expression
| expression '||' expression
| expression '?' expression ':' expression
| IDENTIFIER
| NUMBER
;
(note that when you label one, you must label them all!)
And then the base listener class will have enter
- and exit
method for all alternatives:
public void enterUnaryExpr(...)
public void exitUnaryExpr(...)
public void enterCastExpr(...)
public void exitCastExpr(...)
public void enterMultExpr(...)
public void exitMultExpr(...)
...
There is a very good reason for doing it this way: operator precedence. Taking your example of the logical OR and left bitwise shift, think about something like
if (a << b || c)
Objective-C precedence rules say that the '<<' has precedence, so the correct way to evaluate this is
(a << b) || c
The parser rules manage this by using the chain you mention, because the rule for '||' is higher up in the chain, the parse correctly gives a << b as a subexpression for the || operator.
There is no better way in Antl3, however in Antlr4, there is, as Antlr4 allows directly left recursive rules. I highly recommend the "Definitive Antlr4 reference" as it has a very good explanation of this issue.