TL;DR VERSION: Is there a parser generator that supports the following: when some rule is reduced (I assume LALR(1) parser), then reduction isn't performed, but parser backs off and replaces input with different code using values from this rule and parses that code. Repeat if needed. So if code is "i++" and rule is "expr POST_INCR", I can do more or less:
expr POST_INCR -> "(tmp = expr; expr = expr + 1; tmp)"
So basicaly code rewriting using macros?
LONG VERSION:
I wrote yet another simple interpreted language (in Java for simplicity). It works ok, but it raised some question. Introduction is pretty long, but simple and helps to shows my problem clearly (I think).
I have "while" loop. It is pretty simple, given:
WHILE LPARE boolean_expression RPAREN statement
I generate more or less the following:
new WhileNode(boolean_expression, statement);
This creates new node that, when visited later, generates code for my virtual machine. But I also have the following:
FOR LPAREN for_init_expr SEMICOLON boolean_expression SEMICOLON for_post_expr RPAREN statement
This is "for loop" known from Java or C. From aforementioned rule I create more or less the following:
new ListNode(
for_init_expr,
new WhileNode(
boolean_expression,
new ListNode(
statement,
new ListNode(for_post_expr, null))))
This is of course simple transformation, from:
for (for_init ; boolean_expression ; for_post_expr)
statement
to:
for_init
while (boolean_expression) {
statement
for_post_expr;
}
All is fine and dandy, but things get hairy for the following:
FOR LPAREN var_decl COLON expression RPAREN statement
This if well known and liked:
for (int x : new int[] { 1, 2 })
print(x);
I refrain from posting code that generates AST, since basic for loop was already a little bit long, and what we get here is ever worse. This construction is equal to:
int[] tmp = new int[] { 1, 2 };
for (int it = 0 ; it < tmp.length; it = it + 1) {
int x = tmp[it];
print(x);
}
And since I'm not using types, I simply assume that "expression" (so right side, after COLON) is something that I can iterate over (and arrays are not iterable), I call a function on a result of this "expression" which returns instance of Iterable. So, in fact, my rewritten code isn't as simple as one above, is more or less this:
Iterator it = makeIterable(new int[] { 1, 2 });
while (it.hasNext()) {
int x = it.next();
print(x);
}
It doesn't look THAT bad, but note that AST for this generates three function calls and while loop. To show you what mess it is, I post what I have now:
113 | FOR LPAREN var_decl_name.v PIPE simple_value_field_or_call.o RPAREN statement.s
114 {: Symbol sv = ext(_symbol_v, _symbol_o);
115 String autoVarName = generateAutoVariableName();
116 Node iter = new StatementEndNode(sv, "",
117 new BinNode(sv, CMD.SET, "=",
118 new VarDeclNode(sv, autoVarName),
119 new CallNode(sv, "()",
120 new BinNode(sv, CMD.DOT, ".",
121 new MkIterNode(sv, o),
122 new PushConstNode(sv, "iterator")))));
123 Node varinit = new StatementEndNode(sv, "",
124 new BinNode(sv, CMD.SET, "=",
125 v,
126 new PushConstNode(sv, "null")));
127 Node hasnext = new CallNode(sv, "()",
128 new BinNode(sv, CMD.DOT, ".",
129 new VarNode(sv, autoVarName),
130 new PushConstNode(sv, "hasNext")));
131 Node vargennext = new StatementEndNode(sv, "",
132 new BinNode(sv, CMD.SET, "=",
133 new VarNode(sv, v.name),
134 new CallNode(sv, "()",
135 new BinNode(sv, CMD.DOT, ".",
136 new VarNode(sv, autoVarName),
137 new PushConstNode(sv, "next")))));
138 return new ListNode(sv, "",
139 new ListNode(sv, "",
140 new ListNode(sv, "",
141 iter
142 ),
143 varinit
144 ),
145 new WhileNode(sv, "while",
146 hasnext,
147 new ListNode(sv, "",
148 new ListNode(sv, "",
149 vargennext
150 ),
151 s)));
To answer your questions: yes, I am ashamed of this code.
QUESTION: Is there are parser generator that let's me do something about it, namely given rule:
FOR LPAREN var_decl COLON expr RPAREN statement
tell parser to rewrite it as if it was something else. I imagine that this would require some kind of LISP's macro mechanism (which is easy in lisp due to basically lack of grammar whatsoever), maybe similar to this:
FOR LPAREN var_decl COLON expr RPAREN statement =
{ with [ x = generateAutoName(); ]
emit [ "Iterator $x = makeIterable($expr).iterator();"
"while (${x}.hasNext()) {"
"$var_decl = ${x}.next();"
"$statement"
"}"
]
}
I don't know if this is a well known problem or not, I simply don't even know what to look for - the most similar question that I found is this one: Any software for pattern-matching and -rewriting source code? but it isn't anywhere close to what I need, since it is supposed to work as a separate step and not during compilation, so it doesn't qualify.
Any help will be appreciated.