Using precedence in Bison for unary minus doesn

2019-04-02 13:53发布

问题:

I'm devising a very simple grammar, where I use the unary minus operand. However, I get a shift/reduce conflict. In the Bison manual, and everywhere else I look, it says that I should define a new token and give it higher precedence than the binary minus operand, and then use "%prec TOKEN" in the rule.

I've done that, but I still get the warning. Why?

I'm using bison (GNU Bison) 2.4.1. The grammar is shown below:

%{
#include <string>
extern "C" int yylex(void);
%}

%union {
    std::string token;
}

%token <token> T_IDENTIFIER T_NUMBER
%token T_EQUAL T_LPAREN T_RPAREN

%right T_EQUAL
%left T_PLUS T_MINUS
%left T_MUL T_DIV
%left UNARY

%start program

%%

program : statements expr
;

statements : '\n'
           | statements line
;

line : assignment
     | expr
;

assignment : T_IDENTIFIER T_EQUAL expr
;

expr : T_NUMBER
     | T_IDENTIFIER
     | expr T_PLUS expr
     | expr T_MINUS expr
     | expr T_MUL expr
     | expr T_DIV expr
     | T_MINUS expr   %prec UNARY
     | T_LPAREN expr T_RPAREN
;

回答1:

%prec doesn't do as much as you might hope here. It tells Bison that in a situation where you have - a * b you want to parse this as (- a) * b instead of - (a * b). In other words, here it will prefer the UNARY rule over the T_MUL rule. In either case, you can be certain that the UNARY rule will get applied eventually, and it is only a question of the order in which the input gets reduced to the unary argument.

In your grammar, things are very much different. Any sequence of line non-terminals will make up a sequence, and there is nothing to say that a line non-terminal must end at an end-of-line. In fact, any expression can be a line. So here are basically two ways to parse a - b: either as a single line with a binary minus, or as two “lines”, the second starting with a unary minus. There is nothing to decide which of these rules will apply, so the rule-based precedence won't work here yet.

Your solution is correcting your line splitting, by requiring every line to actually end with or be followed by an end-of-line symbol.

If you really want the behaviour your grammar indicates with respect to line endings, you'd need two separate non-terminals for expressions which can and which cannot start with a T_MINUS. You'd have to propagate this up the tree: the first line may start with a unary minus, but subsequent ones must not. Inside a parenthesis, starting with a minus would be all right again.



回答2:

The expr rule is ok (without the %prec UNARY). Your shift/reduce conflict comes from the rule:

statements : '\n'
           | statements line
;

The rule does not what you think. For example you can write:

a + b c + d

I think that is not supposed to be valid input.

But also the program rule is not very sane:

program : statements expr
;

The rules should be something like:

program: lines;

lines: line | lines line;

line: statement "\n" | "\n";

statement: assignment | expr;