Determining k of LR(k) from this example?

2019-02-22 08:53发布

I have prepared the following grammar that generates a subset of C logical and integer arithmetic expressions:

Expression:
    LogicalOrExpression
    LogicalOrExpression ? Expression : LogicalOrExpression

LogicalOrExpression:
    LogicalAndExpression
    LogicalOrExpression || LogicalAndExpression

LogicalAndExpression:
    EqualityExpression
    LogicalAndExpression && RelationalExpression

EqualityExpression:
    RelationalExpression
    EqualityExpression EqualityOperator RelationalExpression

EqualityOperator:
    ==
    !=

RelationalExpression:
    AdditiveExpression
    RelationalExpression RelationalOperator AdditiveExpression

RelationalOperator:
    <
    >
    <=
    >=

AdditiveExpression:
    MultiplicativeExpression
    AdditiveExpression AdditiveOperator MultiplicativeExpression

AdditiveOperator:
    +
    -

MultiplicativeExpression:
    UnaryExpression
    MultiplicativeExpression MultiplicativeOperator UnaryExpression

MultiplicativeOperator:
    *
    /
    %

UnaryExpression:
    PrimaryExpression
    UnaryOperator UnaryExpression

UnaryOperator:
    +
    -
    !

PrimaryExpression:
    BoolLiteral    // TERMINAL
    IntegerLiteral // TERMINAL
    Identifier     // TERMINAL
    ( Expression )

I want to try using shift/reduce parsing and so would like to know what is the smallest k (if any) for which this grammar is LR(k)? (and more generally how to determine the k from an arbitrary grammar if possible?)

2条回答
姐就是有狂的资本
2楼-- · 2019-02-22 09:39

From Donald Knuths On the Translation of Languages from Left to Right, in the abstract,

It is shown that the problem of whether or not a grammar is LR(k) for some k is undecidable,

In otherwords,

Given a grammar G, "∃k. G ∊ LR(k)" is undecidable.

Therefore, the best we can do in general is try constructing a parser for LR(0), then LR(1), LR(2), etc. At some point you will succeed, or you may at some point give up when k becomes large.

This specific grammar

In this specific case, I happen to know that the grammar you give is LALR(1), which means it must therefore be LR(1). I know this because I have written LALR parsers for similar languages. It can't be LR(0) for obvious reasons (the grammar {A -> x, A -> A + x} is not LR(0)).

查看更多
甜甜的少女心
3楼-- · 2019-02-22 09:53

The sample grammar is (almost) an operator precedence grammar, or Floyd grammar (FG). To make it an FG, you'd have to macro-expand the non-terminals whose right-hand sides consist of only a single terminal, because operator precedence grammars must be operator grammars, and an operator grammar has the feature that no right-hand side has two consecutive non-terminals.

All operator-precedence grammars are LR(1). It's also trivial to show whether or not an operator grammar has the precedence property, and particularly trivial in the case that every terminal appears in precisely one right-hand side, as in your grammar. An operator grammar in which every terminal appears in precisely one right-hand side is always an operator-precedence grammar [1] and consequently always LR(1).

FGs are a large class of grammars, some of them even useful (Algol 60, for example, was described by an FG) for which it is easy to answer the question about them being LR(k) for some k, since the answer is always "yes, with K == 1". Just for precision, here are the properties. We use the normal convention where a grammar G is a 4-tuple (N, Σ, P, S) where N is a set of non-terminals; Σ is a set of terminals, P is a set of productions, and S is the start symbol. We write V for N ⋃ Σ. In any grammar, we have:

N ⋂ Σ = ∅
S ∈ N
P ⊂ V+ × V*

The "context-free" requirement restricts P so every left-hand-side is a single non-terminal:

P ⊂ Σ × V*

In an operator grammar, P is further restricted: no right-hand side is either empty, and no right-hand side has two consecutive non-terminals:

P ⊂ Σ × (V+ − V*ΣΣV*)

In an operator precedence grammar, we define three precedence relations, ⋖, ⋗ and ≐. These are defined in terms of the relations Leads and Trails [2], where `

T Leads V iff T is the first terminal in some string derived from V
T Trails V iff T is the last terminal in some string derived from V

Then:

t1 ⋖ t2 iff ∃v ϶ t2 Leads v ∧ N→V*t1vV* ∈ P
t1 ⋗ t2 iff ∃v ϶ t1 Trails v ∧ N→V*vt2V* ∈ P
t1 ≐ t2 iff N→V*t1t2V* ∈ P ∨ N→V*t1V't2V* ∈ P

An intuitive way of thinking about those relations is this: Normally when we do the derivations, we just substitute RHS for LHS, but suppose we substitute ⋖ RHS ⋗ instead. Then we can modify a derivation by dropping the non-terminals and collapsing strings of consecutive ⋖ and ⋗ to single symbols, and finally adding ≐ between any two consecutive terminals which have no intervening operator. From that, we just read off the relations.

Now, we can perform that computation on any operator grammar, but there is nothing which forces the above relations to be exclusive. An operator grammar is a Floyd grammar precisely if those three relations are mutually exclusive.

Verifying that an operator grammar has mutually exclusive precedence relations is straight-forward; Leads and Trails require a transitive closure over First and Last, which is roughly O(|G|2) (it's actually the product of the number of non-terminals and the number of productions); from there, the precedence relations can be computed with a single linear scan over all productions in the grammar, which is O(|G|).

查看更多
登录 后发表回答