I'm attempting to write a grammar to parse templating language say jinja2 (or twig at your choose), and I can't successfully parse switch-case
statement.
Let me show desired syntax:
{% switch username %}
{% case "Jim" %}
I want to say:
{% case "Nik" %}
Hello man!
{% endcase %}
{% case "Bob" %}
Hi
{% default %}
Who are you?
{% endswitch %}
Here endcase just works as break.
Worked portion of my grammar file:
program ::= template_language(Q) . {
status->ret = Q;
}
template_language(R) ::= statement_list(L) . {
R = L;
}
statement_list(R) ::= statement_list(L) statement(S) . {
R = my_list(L, S);
}
statement_list(R) ::= statement(S) . {
R = my_list(NULL, S);
}
statement(R) ::= switch_statement(E) . {
R = E;
}
// empty {% switch expr %} {% endswitch %}
switch_statement(R) ::= OPEN_DELIMITER SWITCH expr(E) CLOSE_DELIMITER OPEN_DELIMITER ENDSWITCH CLOSE_DELIMITER . {
R = my_switch_statement(E, NULL, status->scanner_state);
}
switch_statement(R) ::= OPEN_DELIMITER SWITCH expr(E) CLOSE_DELIMITER case_clauses(C) OPEN_DELIMITER ENDSWITCH CLOSE_DELIMITER . {
R = my_switch_statement(E, C, status->scanner_state);
}
case_clauses(R) ::= case_clauses(C) case_clause(K) . {
R = my_list(C, K);
}
case_clauses(R) ::= case_clause(K) . {
R = my_list(NULL, K);
}
// empty {% case expr %} {% endcase %}
case_clause(R) ::= OPEN_DELIMITER CASE expr(E) CLOSE_DELIMITER OPEN_DELIMITER ENDCASE CLOSE_DELIMITER . {
R = case_clause(E, NULL, status->scanner_state);
}
case_clause(R) ::= OPEN_DELIMITER CASE expr(E) CLOSE_DELIMITER statement_list(T) OPEN_DELIMITER ENDCASE CLOSE_DELIMITER . {
R = case_clause(E, T, status->scanner_state);
}
This is just part of my grammar and I have worked grammar for for
, if
, while
, do
, loop
, etc.
But I have no idea about:
{% case expr %} statement_list(T)
without{% endcase %}
{% default %} statement_list(T)
For example I tried to use:
case_clause(R) ::= OPEN_DELIMITER CASE expr(E) CLOSE_DELIMITER statement_list(T) . {
R = case_clause(E, T, status->scanner_state);
}
for #1 but no luck, got
This rule can not be reduced.
The same for #2
Frankly speaking I'm understand the root of problem - the lack of case/default-bound, but actually I have no idea how to address this issue.
Any help would be greatly appreciated!
The problem is that your grammar is LR(2), not LR(1).
Many shift/reduce conflicts arise because it is impossible to know what to do until the token following a
%{
is seen. For example, consider the partial template (I've deliberately destroyed the indentation):Now, should the bolded section be reduced to
case_clause
?Remember that in an LR(k) grammar, the decision to reduce must be made by looking only at the
k
tokens following the end of the sequence to be reduced. Lemon, like most LR parser generators, only implements LR(1) parsers, so the decision needs to be made using only one lookahead token, which is the%}
. But the decision cannot be made without knowing what the next token is:In the first input, we do not do the reduction, but we have reached the end of the
statement_list
. In the second one, we need to reduce because we've found the entirecase_clause
. And in the third one, we've started a newstatement
which will need to be appended to thestatement_list
.The first and third possible inputs present no problem, because both of them just correspond to a shift action; the necessary reduction -- whichever one it is -- will be performed later. But the second one needs a reduce to happen before the
%{
, and by the time we see thecase
token, it is too late.The simplest solution, it seems to me, is to force the lexer to recognize
{% keyword
as a single token (different for each keyword). For example, the following, which differs from your grammar only in that each instance ofOPEN_DELIMITER FOO
has been replaced byOPEN_FOO
, presents no problems: (I also replaceCLOSE_DELIMITER
withCLOSE
to avoid horizontal scrolling.)As a side-note, I'd suggest simplifying your grammar by not special-casing empty statement-lists. Just allow an empty base case for
statement_list
: