I am trying to solve shift/reduce conflict in Bison. I have following grammar rules
new_expr:
T_NEW class_name_reference optional_generics_list ctor_arguments
{ $$ = zend_ast_create(ZEND_AST_NEW, $2, $4, $3); }
| T_NEW anonymous_class
{ $$ = $2; }
optional_generics_list:
/* empty */ { $$ = NULL; }
| generics_list { $$ = $1; }
ctor_arguments:
/* empty */ { $$ = zend_ast_create_list(0, ZEND_AST_ARG_LIST); }
| argument_list { $$ = $1; }
The problem here lies in that fact, that both optional_generics_list and ctor_arguments can be empty. How can I specify (if I could) that if both optional_generics_list and ctor_arguments are empty, then ctor_arguments should have higher priority. Or maybe my question is not correct, and how can solve this conflict instead.
Some updated info:
Maybe output of generated .output file will help:
State 156 conflicts: 1 shift/reduce
State 156:
303 new_expr: "new (T_NEW)" class_name_reference . optional_generics_list ctor_arguments
'<' shift, and go to state 304
'<' [reduce using rule 168 (optional_generics_list)]
$default reduce using rule 168 (optional_generics_list)
optional_generics_list go to state 305
generics_list go to state 306
State 305
303 new_expr: "new (T_NEW)" class_name_reference optional_generics_list . ctor_arguments
'(' shift, and go to state 229
$default reduce using rule 405 (ctor_arguments)
argument_list go to state 546
ctor_arguments go to state 552
State 306
169 optional_generics_list: generics_list .
$default reduce using rule 169 (optional_generics_list)
The problem indicated by the bison output file is that it is possible that a new_expr
could be followed by an < which is not part of a generics_list
. That would be the case if, for example, the grammar also contained productions like:
term: new_expr
comparison: term '<' term
(Of course, one would expect a real grammar to have a lot more possibilities than that, but those are the essential ones.)
In other words, if your grammar allows you to compare two newly-constructed objects, then the parser cannot tell whether the < that it sees is the start of a generics_list
, or a simple comparison operator after a new_expr
where both the generics_list
and the ctor_arguments
have been omitted:
if new Foo < oldFoo then...
myFoo = new Foo<int>(42)
The simplest fix would be to insist that a new_expr
be parenthesized if it is going to be used in an expression.
For what it's worth, C++ handles this by knowing whether names are templated or not. If a name can take template arguments, then a < following the name is interpreted as starting the template argument list; otherwise it's a less-than operator. So if v
is templated, then you have to write (new v) < ...
, but if v
is just a simple typename, you can leave out the parentheses: new int < ...
. implementing that is tricky; you need some kind of lexical feedback, and you need to impose some restrictions on where template declarations can be placed. C++ has some other unique parsing challenges with similar resolutions. For example, new int * i
is an error because the *
is parsed as being a pointer type modifier, using a rule which says that the type in a new
expression is the longest sequence of tokens parseable as a type.
I'd go for mandatory parentheses when new
is used as the left argument of any operator, since it's less confusing for the reader. It also simplifies the grammar, and that's a Good Thing because grammars are not just for parsing; they are an essential part of the documentation of the language, and unnecessarily complicated grammars make the language unnecessarily hard to learn and understand.
Interestingly, in the course of writing the above note about C++, I discovered a bug in the parser of one major C++ compiler. (At least, I think I know which compiler is buggy; it would be more accurate to say that I discovered that two popular compilers have inconsistent behaviour, so one of them must be wrong.) A simpler rule would have had no effect on any non-contrived program, and would have made the bug much less likely (and much easier to verify). So it is not at all clear to me the value added of permitting unparenthesized new
operations in the middle of an expression.