Parsing with parenthesis and different types of ex

2019-09-16 05:52发布

问题:

I'm currently using happy to parse a language, but I don't think the parser is relevant, except to say it's an LALR parser. Here's a small excerpt from the grammar:

ArithExpr -> ArithExpr + ArithExpr
ArithExpr -> ( ArithExpr )
ArithExpr -> ...

BoolExpr -> ArithExpr == ArithExpr
BoolExpr -> ( BoolExpr )
BoolExpr -> ...

The problem is that I'm getting reduce-reduce conflicts. I think the issue arises when I try to parse something like the following:

( ( 2 + 3 ) == ( 4 + 5 ) ) 

There's only one way to parse this expression, but the problem is I think even at the first parenthesis the parser starts having some trouble. The reason I think this is the case is that the parser does not know whether it's facing a ArithExpr or a BoolExpr in the future, and as it's only got one token of lookahead it has to make an arbitrary choice which may be wrong.

Is there anyway to rewrite the grammar to accept this language? Or should I really just be parsing both ArithExpr and BoolExpr just as one uniform Expr and dealing with the actual types during type checking?

回答1:

You should just parse Expr and do the type checking during semantic analysis. Otherwise, you will have really a hard time dealing with either parenthesized expressions (you can't tell what type they are until too late) or first-class boolean values (a variable might have a boolean value, no?).

See my answer here for an alternative (but it ends up giving the same advice); I provide the link for completeness only because I'm really not convinced of the value of the techniques described in that answer, but I think it is essentially the same question with a different LALR parser generator.