I am trying to write a translator for a Java like language to multiple languages.
Right now I am facing 2 problems:
First is to decompose complex expression in a sequence of basic operation and then translating them to destination language.
For example my starting language can be:
var a = (ln(b) + avg(c))*2
I'd like to traslate it like:
var x1 = log_N(b);
var x2 = average(c);
var x3 = sum(x1, x2);
var a = multiply(x3, 2);
I think i have to use a Tree parser, nut I am not sure how to integrate it with StringTemplate. Moreover i am adding extra variables like x1, x2, and x3 and i don't know how to handle this situation.
The second problem is that one of my destination language is a plsql like language. In this case a need to ensure that all output variables become cursors and pass them to the functions.
For example the expression:
var a = (ln(b) + avg(c))*2
Should be translated in this way:
log_N(x1, b);
average(x2, c);
sum(x3, x1, x2);
multiply(a, x3, 2);
where x1, x2, x3 and a will become output cursors.
Can anyone help me finding the right solution?
Thanks
Integrating StringTemplate into a tree parser is basically the same as integrating it into a token parser: define the
output
option astemplate
, then write the rule productions accordingly.Below is a small tree grammar that uses templates for output. Note that the only meaningful difference between this grammar and the one I described in a previous answer is that this one operates on tree nodes rather than tokens. The templates work the same.
That's the kicker. The easiest way that I know of (which isn't terribly easy) is to first generate the baseline AST using the token parser then use a tree parser to flatten the AST into a list of declarations, each representing expressions from the original input --
x1
,x2
, andx3
in your question.The intermediate stages look like this:
Original Input
Baseline AST
Flattened AST
Standard Template Applied
PLSQL Template Applied
Here's a token parser grammar that simply generates a baseline AST that resembles the one in your question. There's nothing really noteworthy here, it just produces a typical AST.
Here is where it gets ugly (at least the way I choose to do it ;)). The tree grammar below transforms a baseline AST produced from above into a list of declarations that correspond to the expressions of the AST -- the flattened AST. Below this, I'll step through the fun bits of the grammar.
First, notice that the grammar is mostly Java code. There are only five parser rules, and most of them are simple. This is a
filter
tree grammar, so rulesbottomup
andtopdown
are the entry points. Onlybottomup
is necessary in this case, sotopdown
isn't specified. Rulebottomup
is repeated until the output tree is unchanged, which to us means when there are no more declarations to produce and the tree is fully flattened.Second, notice that rule
exit_program
is where the new declarations are being written out to the AST. I'm using a semantic predicate ({newDecls}?
) to ensure thatPROGRAM
is only modified when new declarations are added. Remember how I saidbottomup
is called until no more changes are made? Without this semantic predicate,exit_program
will always modifyPROGRAM
and the tree parsing will never stop processingbottomup
. This is a crude work-around for this special case but it works. The new declarations are inserted at the beginning ofPROGRAM
to ensure that they appear before they are referenced. It's no good definingx1
ten lines after it's expected.Third, notice that rule
exit_op
replaces expressions (likeln(b)
) with declarations (likevar0
). An expression is replaced if one of the following is true:The expression is a binary operation whose operands are both "reduced" (that is, they're both integers or variable ids) and is not the child of a
DECL
node.var a = 1 + 2
is not changed because1 + 2
is the child of a declaration.var b = a + (2 + c)
is changed because(2 + c)
has two "reduced" operands and is not the child of aDECL
node (it's the child of the+
ina + ...
).The expression is a
CALL
that is not the child of aDECL
node.var a = ln(b)
is untouched, butvar a = ln(b) + 3
is changed becauseln(b)
is a child of+
.An expression is stored in
exprs
before it's replaced by an id. It's reconstituted in theexit_program
rule when the rule callsbuildNewDecls
.buildNewDecls
just uses the parser's built-inTreeAdaptor
member (namedadaptor
) to generate theDECL
nodes that appear in the flattened AST. The Javadoc for the adaptor methods does an adequate job explaining what the calls do, so I won't go into the details.Caveat: The parsers produced by the grammars above work fine for the very limited case that you've presented. I don't know what errors they will produce when applied to any broader scenario.
That would be something that the templates could manage for you once your AST is just a flat list of declarations, as shown above.
You'll pass the flattened AST to a template-based tree parser, like the one up top, to produce the different text versions like those you've listed. In this case, a template would accept all the parts of a declaration – the variable name, the operation/method name, and the operands/arguments – and produce text like either
variable = method(arg0, arg1)
ormethod(variable, arg0, arg1)
, depending on the template used. The key is to ensure that the input is flat and that the template receives everything related to the declaration.Here is a test application that ties it all together.
JavaLikeToAstTest.java
Here are two simple StringTemplate group files to handle the translation process.
AstToGlobal.stg
AstToPlsql.stg
The application produces the following output:
There is no code in
AstToTemplate.g
or the templates to handle simple assignments likevar a = 3
, but I think it will be easy enough to add the code to handle it using the op/method assignment as a guide.