Shift/Reduce conflict in Bison, when trying to add

2019-07-31 13:26发布

问题:

I am trying to solve shift/reduce conflict in Bison. I have following grammar rules

new_expr:
    T_NEW class_name_reference optional_generics_list ctor_arguments
        { $$ = zend_ast_create(ZEND_AST_NEW, $2, $4, $3); }
|   T_NEW anonymous_class
        { $$ = $2; }

optional_generics_list:
    /* empty */     { $$ = NULL; }
|   generics_list   { $$ = $1; }

ctor_arguments:
    /* empty */ { $$ = zend_ast_create_list(0, ZEND_AST_ARG_LIST); }
|   argument_list { $$ = $1; }

The problem here lies in that fact, that both optional_generics_list and ctor_arguments can be empty. How can I specify (if I could) that if both optional_generics_list and ctor_arguments are empty, then ctor_arguments should have higher priority. Or maybe my question is not correct, and how can solve this conflict instead.

Some updated info: Maybe output of generated .output file will help:

    State 156 conflicts: 1 shift/reduce

State 156:

  303 new_expr: "new (T_NEW)" class_name_reference . optional_generics_list ctor_arguments

    '<'  shift, and go to state 304

    '<'       [reduce using rule 168 (optional_generics_list)]
    $default  reduce using rule 168 (optional_generics_list)

    optional_generics_list  go to state 305
    generics_list           go to state 306


State 305

  303 new_expr: "new (T_NEW)" class_name_reference optional_generics_list . ctor_arguments

    '('  shift, and go to state 229

    $default  reduce using rule 405 (ctor_arguments)

    argument_list   go to state 546
    ctor_arguments  go to state 552


State 306

  169 optional_generics_list: generics_list .

    $default  reduce using rule 169 (optional_generics_list)

回答1:

The problem indicated by the bison output file is that it is possible that a new_expr could be followed by an < which is not part of a generics_list. That would be the case if, for example, the grammar also contained productions like:

term: new_expr
comparison: term '<' term

(Of course, one would expect a real grammar to have a lot more possibilities than that, but those are the essential ones.)

In other words, if your grammar allows you to compare two newly-constructed objects, then the parser cannot tell whether the < that it sees is the start of a generics_list, or a simple comparison operator after a new_expr where both the generics_list and the ctor_arguments have been omitted:

if new Foo < oldFoo then...

myFoo = new Foo<int>(42)

The simplest fix would be to insist that a new_expr be parenthesized if it is going to be used in an expression.

For what it's worth, C++ handles this by knowing whether names are templated or not. If a name can take template arguments, then a < following the name is interpreted as starting the template argument list; otherwise it's a less-than operator. So if v is templated, then you have to write (new v) < ..., but if v is just a simple typename, you can leave out the parentheses: new int < .... implementing that is tricky; you need some kind of lexical feedback, and you need to impose some restrictions on where template declarations can be placed. C++ has some other unique parsing challenges with similar resolutions. For example, new int * i is an error because the * is parsed as being a pointer type modifier, using a rule which says that the type in a new expression is the longest sequence of tokens parseable as a type.

I'd go for mandatory parentheses when new is used as the left argument of any operator, since it's less confusing for the reader. It also simplifies the grammar, and that's a Good Thing because grammars are not just for parsing; they are an essential part of the documentation of the language, and unnecessarily complicated grammars make the language unnecessarily hard to learn and understand.

Interestingly, in the course of writing the above note about C++, I discovered a bug in the parser of one major C++ compiler. (At least, I think I know which compiler is buggy; it would be more accurate to say that I discovered that two popular compilers have inconsistent behaviour, so one of them must be wrong.) A simpler rule would have had no effect on any non-contrived program, and would have made the bug much less likely (and much easier to verify). So it is not at all clear to me the value added of permitting unparenthesized new operations in the middle of an expression.



标签: c parsing bison