-->

A yacc shift/reduce conflict on an unambiguous gra

2019-01-20 06:21发布

问题:

A piece of code of my gramamar its driveing me crazy.

I have to write a grammar that allow write functions with multiple inputs

e.g.

function
  begin
    a:
       <statments>
    b:
       <statements>
  end

The problem with that its that is statements that are assignments like this

ID = Expresion.

in the following quote you can see the output produced by yacc.

  0  $accept : InstanciasFuncion $end

  1  InstanciasFuncion : InstanciasFuncion InstanciaFuncion
  2                    | InstanciaFuncion

  3  InstanciaFuncion : PuntoEntrada Sentencias

  4  PuntoEntrada : ID ':'

  5  Sentencias : Sentencias Sentencia
  6             | Sentencia

  7  Sentencia : ID '=' ID

State 0

    0 $accept: . InstanciasFuncion $end

    ID  shift, and go to state 1

    InstanciasFuncion  go to state 2
    InstanciaFuncion   go to state 3
    PuntoEntrada       go to state 4

State 1

    4 PuntoEntrada: ID . ':'

    ':'  shift, and go to state 5

State 2

    0 $accept: InstanciasFuncion . $end
    1 InstanciasFuncion: InstanciasFuncion . InstanciaFuncion

    $end  shift, and go to state 6
    ID    shift, and go to state 1

    InstanciaFuncion  go to state 7
    PuntoEntrada      go to state 4

State 3

    2 InstanciasFuncion: InstanciaFuncion .

    $default  reduce using rule 2 (InstanciasFuncion)

State 4

    3 InstanciaFuncion: PuntoEntrada . Sentencias

    ID  shift, and go to state 8

    Sentencias  go to state 9
    Sentencia   go to state 10

State 5

    4 PuntoEntrada: ID ':' .

    $default  reduce using rule 4 (PuntoEntrada)

State 6

    0 $accept: InstanciasFuncion $end .

    $default  accept

State 7

    1 InstanciasFuncion: InstanciasFuncion InstanciaFuncion .

    $default  reduce using rule 1 (InstanciasFuncion)

State 8

    7 Sentencia: ID . '=' ID

    '='  shift, and go to state 11

State 9

    3 InstanciaFuncion: PuntoEntrada Sentencias .
    5 Sentencias: Sentencias . Sentencia

    ID  shift, and go to state 8

    ID        [reduce using rule 3 (InstanciaFuncion)]
    $default  reduce using rule 3 (InstanciaFuncion)

    Sentencia  go to state 12

State 10

    6 Sentencias: Sentencia .

    $default  reduce using rule 6 (Sentencias)

State 11

    7 Sentencia: ID '=' . ID

    ID  shift, and go to state 13

State 12

    5 Sentencias: Sentencias Sentencia .

    $default  reduce using rule 5 (Sentencias)

State 13

    7 Sentencia: ID '=' ID .

    $default  reduce using rule 7 (Sentencia)

Maybe somebody can help me to disambiguate this grammar

回答1:

Bison provides you with at least a hint. In State 9, which is really the only relevant part of the output other than the grammar itself, we see:

State 9

    3 InstanciaFuncion: PuntoEntrada Sentencias .
    5 Sentencias: Sentencias . Sentencia

    ID  shift, and go to state 8

    ID        [reduce using rule 3 (InstanciaFuncion)]
    $default  reduce using rule 3 (InstanciaFuncion)

    Sentencia  go to state 12

There's a shift/reduce conflict with ID, in the context in which the possibilities are:

  1. Complete the parse of an InstanciaFuncion (reduce)

  2. Continue the parse of a Sentencias (shift)

In both of those contexts, an ID is possible. It's easy to construct an example. Consider these two instancias:

f : a = b c = d ...
f : a = b c : d = ...

We've finished with the b and c is the lookahead, so we can't see the symbol which follows the c. Now, have we finished parsing the funcion f? Or should we try for a longer list of sentencias? No se sabe. (Nobody knows.)

Yes, your grammar is unambiguous, so it doesn't need to be disambiguated. It's not LR(1), though: you cannot tell what to do by only looking at the next one symbol. However, it is LR(2), and there is a proof than any LR(2) grammar has a corresponding LR(1) grammar. (For any value of 2 :) ). But, unfortunately, actually doing the transformation is not always very pretty. It can be done mechanically, but the resulting grammar can be hard to read. (See Notes below for references.)

In your case, it's pretty easy to find an equivalent grammar, but the parse tree will need to be adjusted. Here's one example:

InstanciasFuncion : PuntoEntrada 
                  | InstanciasFuncion PuntoEntrada
                  | InstanciasFuncion Sentencia

PuntoEntrada: ID ':' Sentencia

Sentencia : ID '=' ID

It's a curious fact that this precise shift/reduce conflict is a feature of the grammar of bison itself, since bison accepts grammars as written above (i.e. without semi-colons). Posix insists that yacc do so, and bison tries to emulate yacc. Bison itself solves this problem in the scanner, not in the grammar: it's scanner recognizes "ID :" as a single token (even if separated with arbitrary whitespace). That might also be your best bet.


There is an excellent description of the proof than any LR(k) grammar can be covered by an LR(1) grammar, including the construction technique and a brief description of how to recover the original parse tree, in Sippu & Soisalon-Soininen, Parsing Theory, Vol. II (Springer Verlag, 1990) (Amazon). This two-volume set is a great reference for theoreticians, and has a lot of valuable practical information, but its heavy reading and its also a serious investment. If you have a university library handy, there should be a copy of it available. The algorithm presented is due to MD Mickunas, and was published in 1976 in JACM 23:17-30 (paywalled), which you should also be able to find in a good university library. Failing that, I found a very abbreviated description in Richard Marion Schell's thesis.

Personally, I wouldn't bother with all that, though. Either use a GLR parser, or use the same trick bison uses for the same purpose. Or use the simple grammar in the answer above and fiddle with the AST afterwards; it's not really difficult.