I have a expression like this a*(b+c)
and I have successfully parsed into an AST so it finally becomes:
I'm trying to expand the expression it finally becomes
a*b + a*c
, but with no luck.
I would like to know an algorithm to expand the expression, or maybe a library to do it, preferably for .NET.
In Symja you can use the Distribute() or Expand() function for your problem:
package org.matheclipse.core.examples;
import org.matheclipse.core.eval.ExprEvaluator;
import org.matheclipse.core.expression.F;
import org.matheclipse.core.interfaces.IExpr;
import org.matheclipse.parser.client.SyntaxError;
import org.matheclipse.parser.client.math.MathException;
public class ExpandSO54204637 {
public static void main(String[] args) {
try {
ExprEvaluator util = new ExprEvaluator();
IExpr expr = util.eval("a*(b+c)");
IExpr result = util.eval(F.Distribute(expr));
// print: a*b+a*c
System.out.println(result.toString());
result = util.eval(F.Expand(expr));
// print: a*b+a*c
System.out.println(result.toString());
} catch (SyntaxError e) {
// catch Symja parser errors here
System.out.println(e.getMessage());
} catch (MathException me) {
// catch Symja math errors here
System.out.println(me.getMessage());
} catch (Exception e) {
e.printStackTrace();
} catch (final StackOverflowError soe) {
System.out.println(soe.getMessage());
} catch (final OutOfMemoryError oome) {
System.out.println(oome.getMessage());
}
}
}
If you're using the shunting-yard algorithm to construct your AST, whenever you pop a left paren onto the operator stack after a multiplication operator- that's your signal to distribute and expand the expression.
For the image provided, the addition operator moves to be the root node and its children are copies of your current tree with the exception of the addition node being replaced by its own children.
Hope this helps with creating a solution.
I've found that Math.NET has a feature to do that:
SymbolicExpression.Parse("a*(b+c)").Expand();
You can do this by writing procedural code or you can use a source to source transformation program transformation system (PTS).
Writing procedural code just consists of calling support functions to navigate up and down the tree, delete links between nodes, delete nodes, create nodes and link nodes. Any AST library has (should have!) such functions.
So the procedural solution for "a*(b+c)" rewrites to "ab+ac" is this:
Find tree root of "a*(b+c)". That's the "*" node.
Unlink the "a" child and the "b+c" child.
Unlink the "b" and "c" children from the "+" node.
Create a second "*" node.
Link "a" and "b" nodes under the original "*" node, producing subtree "a*b*
Copy the "a" node.
Link the copyied-"a" and "c" under the second "*" node, producing subtree "a*c"
Link the subtrees under the "+" node.
Return "+" as the root of the tree.
There's nothing hard about this, it is just shuffling links.
But it is annoying to write, doesn't help with parsing the expression from the language you probably want to manipulate ("C#"?), doesn't do complicated transforms easily and doesn't help you find this kind of subtree in much larger AST that you are probably trying to modify.
That's why you want a PTS. A good PTS provides parsing machinery to enable construction of parsers for complex languages such as Java, COBOL, C++ or C#. It also provides a way to write pattern-directed rewrites. It gets brownie points if it happens to have seriously validated parsers for the languages you want to manipulate (because otherwise you get the write the parser too on top of your tree manipulation problem).
As an example, with our DMS Software Reengineering Toolkit, you can take advantage of fully validated parsers for the above languages. Assuming you want to manipulate C#,
you can then write this DMS script to accomplish your example on arbitrarily big C# ASTs:
domain CSharp~CSharp7_5; -- establishes which parser to use to read source code
rule distribute_times_over_plus(a:multiplicative_expression,
b:additive_expression,
c:multiplicative_expression)
:multiplicative_expression->multiplicative_expression
= "\a*(\b+\c)" -> "(\a*\b+\a*\c)";
You can hand this script to DMS, and it will parse a C# source file, and apply this transform everywhere the pattern is found. (If you want more control over where/when this is applied, you have write an additional metaprogramming script to define that rather than leaning on the built-in "apply everywhere" operation).
It should be clear this is a lot easier to write; not so clear but a big benefit is that it is checked for sanity by DMS. You can't write a rule that breaks the language syntax. (If you write procedural code, you can link the your nodes in any nonsensical way, then you get to debug it). This is a huge help if you want to write many rules: there's a whole class of errors you can't make. Finally, these rules a LOT more readable than whatever that procedural code you might write will be; that makes them much easier to read, understand and modify.
More details on what you can write in rules can be found at DMS Rewrite Rules.
If you want to see this example in full detail from defining a language ("college calculus") and applying rules to that language ("how to differentiate formulas") you can see it at: Algebra as a DMS Domain
One more (huge) detail: rewriting on plain ASTs isn't very effective if they represent programming langauges, because you cannot ignore the meaning and scope of identifiers. See Life After Parsing" for a deep discussion.
But the the bottom line is that your rewrite rules often need to be conditional on semantic properties of the programming language you intend to manipulate. DMS rules handle that by allowing an additional if condition clause that can invoke semantic predicates defined for that langauge to DMS. You can see some simple examples of this in the Algebra example.
This is a one-line program in prolog .
And as a bonus , it works both ways .
i.e. you design it to "expand" , you get "unexpand" for free . Here is an example that uses the interactive REPL of yap prolog . The identifiers that are all capital letters are the variables .
$ yap
YAP 6.2.2 (x86_64-linux): Sat Sep 17 13:59:03 UTC 2016
?- [user].
/* consulting user_input... */
rewrite(A * (B + C), (A * B + A * C)) .
end_of_file .
/* example usage from the REPL */
?- rewrite(3 * (4 + 5), REWRITTEN) .
REWRITTEN = 3*4+3*5
?- rewrite(a * (b + c), REWRITTEN) .
REWRITTEN = a*b+a*c
/* example usage showing it work the opposite way */
?- rewrite(REWRITABLE,(3*4+3*5)) .
REWRITABLE = 3*(4+5)