I am writing a simple calculator in yacc / bison.
The grammar for an expression looks somewhat like this:
expr
: NUM
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '+' expr %prec '*' { $$ = $1; }
| '-' expr %prec '*' { $$ = $1; }
| '(' expr ')' { $$ = $2; }
| expr expr { $$ = $1 '*' $2; }
;
I have declared the precedence of the operators like this.
%left '+' '-'
%left '*' '/'
%nonassoc '('
The problem is with the last rule:
expr expr { $$ = $1 $2; }
I want this rule because I want to be able to write expressions like 5(3+4)(3-24)
in my calculator.
Is it possible to make this grammar unambiguous?
The ambiguity results from the fact that you allow unary operators (- expr
), so 2 - 2
can be parsed either as a simple subtraction (yielding 0) or as an implicit product (of 2 and -2, yielding -4).
It's clear that subtraction is intended (otherwise subtraction would be impossible to represent) so it is necessary to ban the production expr: expr expr
if the second expr
on the right-hand side is a unary operation.
That can't be done with precedence declarations (or at least it cannot be done in an obvious way), so the best solution is to write out the grammar explicitly, without relying on precedence to disambiguate.
You will also have to decide exactly what is the precedence of implicit multiplication: either the same as explicit multiplication/division, or stronger. That affects how ab/cd
is parsed. There is no consensus that I know of, so it is more or less up to you.
In the following, I assume that implicit multiplication binds more tightly. I also ensure that -ab
is parsed as (-a)b
, although -(ab)
has the same end result (until you start dealing with things like non-arithmetic types and automatic conversions). So just take it as an example.
term: NUM
| '(' expr ')'
unop: term
| '-' unop
| '+' unop
conc: unop
| conc term
prod: conc
| prod '*' conc
| prod '/' conc
expr: prod
| expr '+' prod
| expr '-' prod