I'm using PLY. Here is one of my states from parser.out:
state 3
(5) course_data -> course .
(6) course_data -> course . course_list_tail
(3) or_phrase -> course . OR_CONJ COURSE_NUMBER
(7) course_list_tail -> . , COURSE_NUMBER
(8) course_list_tail -> . , COURSE_NUMBER course_list_tail
! shift/reduce conflict for OR_CONJ resolved as shift
$end reduce using rule 5 (course_data -> course .)
OR_CONJ shift and go to state 7
, shift and go to state 8
! OR_CONJ [ reduce using rule 5 (course_data -> course .) ]
course_list_tail shift and go to state 9
I want to resolve this as:
if OR_CONJ is followed by COURSE_NUMBER:
shift and go to state 7
else:
reduce using rule 5 (course_data -> course .)
How can I fix my parser file to reflect this? Do I need to handle a syntax error by backtracking and trying a different rule?
The documentation says:
These values are then used to attach a numerical precedence value and associativity direction to each grammar rule. This is always determined by looking at the precedence of the right-most terminal symbol.
What if the rule has no terminals?
UPDATE: The complete grammar:
Grammar
Rule 0 S' -> statement
Rule 1 statement -> course_data
Rule 2 or_phrase -> statement OR_CONJ statement
Rule 3 or_phrase -> course OR_CONJ COURSE_NUMBER
Rule 4 statement -> or_phrase
Rule 5 course_data -> course
Rule 6 course_data -> course course_list_tail
Rule 7 course_list_tail -> , COURSE_NUMBER
Rule 8 course_list_tail -> , COURSE_NUMBER course_list_tail
Rule 9 course -> DEPT_CODE COURSE_NUMBER
Your basic problem is that you need two tokens of lookahead to do what you want -- when the input seen so far is a
course
and the lookahead is aOR_CONJ
you don't know whether to reduce thecourse
to acourse_data
or shift without looking ahead two tokens to the token after theOR_CONJ
. There are a number of ways you can deal with thisuse an LR(2) or LR(k) or GLR parser generator -- any can deal with this.
use a lexer hack to do the lookahead -- basically have the lexer return two different
OR_CONJ
tokens depending on whether the following token is aCOURSE_NUMBER
or not.factor the grammar to get rid of the conflict, which may result in a grammar that parses something slightly different from what you want (need some extra post-parse checks to reject some invalid constructs) and will generally make the grammar much harder to understand.
Note that your grammar as given is also ambiguous related to which way three or more courses connected in a single statement associate. This is easily fixed by rewriting the grammar into a clearer left-recursive form:
This could also be rewritten as right-recursive for an LL parser generator, but it still has the 2-token lookahead problem. One way of refactoring it to make that go away would be to make
COURSE_NUMBER
by itself a validcourse
and recombine it with the previouscourse
in a post-pass (or give an error if its the firstcourse
in astatement
). Then rule 4 becomes:and you have no conflicts.