I have prepared the following grammar that generates a subset of C logical and integer arithmetic expressions:
Expression:
LogicalOrExpression
LogicalOrExpression ? Expression : LogicalOrExpression
LogicalOrExpression:
LogicalAndExpression
LogicalOrExpression || LogicalAndExpression
LogicalAndExpression:
EqualityExpression
LogicalAndExpression && RelationalExpression
EqualityExpression:
RelationalExpression
EqualityExpression EqualityOperator RelationalExpression
EqualityOperator:
==
!=
RelationalExpression:
AdditiveExpression
RelationalExpression RelationalOperator AdditiveExpression
RelationalOperator:
<
>
<=
>=
AdditiveExpression:
MultiplicativeExpression
AdditiveExpression AdditiveOperator MultiplicativeExpression
AdditiveOperator:
+
-
MultiplicativeExpression:
UnaryExpression
MultiplicativeExpression MultiplicativeOperator UnaryExpression
MultiplicativeOperator:
*
/
%
UnaryExpression:
PrimaryExpression
UnaryOperator UnaryExpression
UnaryOperator:
+
-
!
PrimaryExpression:
BoolLiteral // TERMINAL
IntegerLiteral // TERMINAL
Identifier // TERMINAL
( Expression )
I want to try using shift/reduce parsing and so would like to know what is the smallest k (if any) for which this grammar is LR(k)? (and more generally how to determine the k from an arbitrary grammar if possible?)
From Donald Knuths On the Translation of Languages from Left to Right, in the abstract,
In otherwords,
Therefore, the best we can do in general is try constructing a parser for LR(0), then LR(1), LR(2), etc. At some point you will succeed, or you may at some point give up when k becomes large.
This specific grammar
In this specific case, I happen to know that the grammar you give is LALR(1), which means it must therefore be LR(1). I know this because I have written LALR parsers for similar languages. It can't be LR(0) for obvious reasons (the grammar {A -> x, A -> A + x} is not LR(0)).
The sample grammar is (almost) an operator precedence grammar, or Floyd grammar (FG). To make it an FG, you'd have to macro-expand the non-terminals whose right-hand sides consist of only a single terminal, because operator precedence grammars must be operator grammars, and an operator grammar has the feature that no right-hand side has two consecutive non-terminals.
All operator-precedence grammars are
LR(1)
. It's also trivial to show whether or not an operator grammar has the precedence property, and particularly trivial in the case that every terminal appears in precisely one right-hand side, as in your grammar. An operator grammar in which every terminal appears in precisely one right-hand side is always an operator-precedence grammar [1] and consequently alwaysLR(1)
.FGs are a large class of grammars, some of them even useful (Algol 60, for example, was described by an FG) for which it is easy to answer the question about them being
LR(k)
for somek
, since the answer is always "yes, withK == 1
". Just for precision, here are the properties. We use the normal convention where a grammarG
is a 4-tuple (N, Σ, P, S) whereN
is a set of non-terminals;Σ
is a set of terminals,P
is a set of productions, andS
is the start symbol. We writeV
forN ⋃ Σ
. In any grammar, we have:N ⋂ Σ = ∅
S ∈ N
P ⊂ V+ × V*
The "context-free" requirement restricts
P
so every left-hand-side is a single non-terminal:P ⊂ Σ × V*
In an operator grammar,
P
is further restricted: no right-hand side is either empty, and no right-hand side has two consecutive non-terminals:P ⊂ Σ × (V+ − V*ΣΣV*)
In an operator precedence grammar, we define three precedence relations, ⋖, ⋗ and ≐. These are defined in terms of the relations
Leads
andTrails
[2], where `T Leads V iff T is the first terminal in some string derived from V
T Trails V iff T is the last terminal in some string derived from V
Then:
t1 ⋖ t2 iff ∃v ϶ t2 Leads v ∧ N→V*t1vV* ∈ P
t1 ⋗ t2 iff ∃v ϶ t1 Trails v ∧ N→V*vt2V* ∈ P
t1 ≐ t2 iff N→V*t1t2V* ∈ P ∨ N→V*t1V't2V* ∈ P
An intuitive way of thinking about those relations is this: Normally when we do the derivations, we just substitute RHS for LHS, but suppose we substitute
⋖ RHS ⋗
instead. Then we can modify a derivation by dropping the non-terminals and collapsing strings of consecutive ⋖ and ⋗ to single symbols, and finally adding ≐ between any two consecutive terminals which have no intervening operator. From that, we just read off the relations.Now, we can perform that computation on any operator grammar, but there is nothing which forces the above relations to be exclusive. An operator grammar is a Floyd grammar precisely if those three relations are mutually exclusive.
Verifying that an operator grammar has mutually exclusive precedence relations is straight-forward;
Leads
andTrails
require a transitive closure overFirst
andLast
, which is roughlyO(|G|2)
(it's actually the product of the number of non-terminals and the number of productions); from there, the precedence relations can be computed with a single linear scan over all productions in the grammar, which isO(|G|)
.