Python/YACC: Resolving a shift/reduce conflict

2020-05-08 18:45发布

问题:

I'm using PLY. Here is one of my states from parser.out:

state 3

    (5) course_data -> course .
    (6) course_data -> course . course_list_tail
    (3) or_phrase -> course . OR_CONJ COURSE_NUMBER
    (7) course_list_tail -> . , COURSE_NUMBER
    (8) course_list_tail -> . , COURSE_NUMBER course_list_tail

  ! shift/reduce conflict for OR_CONJ resolved as shift
    $end            reduce using rule 5 (course_data -> course .)
    OR_CONJ         shift and go to state 7
    ,               shift and go to state 8

  ! OR_CONJ         [ reduce using rule 5 (course_data -> course .) ]

    course_list_tail               shift and go to state 9

I want to resolve this as:

if OR_CONJ is followed by COURSE_NUMBER:
    shift and go to state 7
else:
    reduce using rule 5 (course_data -> course .)

How can I fix my parser file to reflect this? Do I need to handle a syntax error by backtracking and trying a different rule?

The documentation says:

These values are then used to attach a numerical precedence value and associativity direction to each grammar rule. This is always determined by looking at the precedence of the right-most terminal symbol.

What if the rule has no terminals?

UPDATE: The complete grammar:

Grammar

Rule 0     S' -> statement
Rule 1     statement -> course_data
Rule 2     or_phrase -> statement OR_CONJ statement
Rule 3     or_phrase -> course OR_CONJ COURSE_NUMBER
Rule 4     statement -> or_phrase
Rule 5     course_data -> course
Rule 6     course_data -> course course_list_tail
Rule 7     course_list_tail -> , COURSE_NUMBER
Rule 8     course_list_tail -> , COURSE_NUMBER course_list_tail
Rule 9     course -> DEPT_CODE COURSE_NUMBER

回答1:

Your basic problem is that you need two tokens of lookahead to do what you want -- when the input seen so far is a course and the lookahead is a OR_CONJ you don't know whether to reduce the course to a course_data or shift without looking ahead two tokens to the token after the OR_CONJ. There are a number of ways you can deal with this

  • use an LR(2) or LR(k) or GLR parser generator -- any can deal with this.

  • use a lexer hack to do the lookahead -- basically have the lexer return two different OR_CONJ tokens depending on whether the following token is a COURSE_NUMBER or not.

  • factor the grammar to get rid of the conflict, which may result in a grammar that parses something slightly different from what you want (need some extra post-parse checks to reject some invalid constructs) and will generally make the grammar much harder to understand.

Note that your grammar as given is also ambiguous related to which way three or more courses connected in a single statement associate. This is easily fixed by rewriting the grammar into a clearer left-recursive form:

Rule 1    statement -> course
Rule 2    statement -> statement OR_CONJ course
Rule 3    course -> DEPT_CODE course_list
Rule 4    course -> DEPT CODE course_list OR_CONJ COURSE_NUMBER
Rule 5    course_list -> COURSE_NUMBER
Rule 6    course_list -> course_list , COURSE_NUMBER

This could also be rewritten as right-recursive for an LL parser generator, but it still has the 2-token lookahead problem. One way of refactoring it to make that go away would be to make COURSE_NUMBER by itself a valid course and recombine it with the previous course in a post-pass (or give an error if its the first course in a statement). Then rule 4 becomes:

Rule 4    course -> COURSE_NUMBER

and you have no conflicts.