Are LINQ expression trees proper trees?

2019-04-17 21:59发布

问题:

Are LINQ expression trees proper trees, as in, graphs (directed or not, wikipedia does not seem too agree) without cycles? What is the root of an expression tree from the following C# expression?

(string s) => s.Length

The expression tree looks like this, with "->" denoting the name of the property of the node the other node is accessible through.

     ->Parameters[0]
 Lambda---------Parameter(string s)
    \               /
     \->Body       /->Expression
      \           /
      Member(Length)

When using ExpressionVisitor to visit the LambdaExpression, the ParameterExpression is visited twice. Is there a way to use the ExpressionVisitor to visit the LambdaExpression so that all the nodes are visited exactly once, and in a specific, well-known order (pre-order, in-order, post-order etc.)?

回答1:

Sort of, yes. The actual "trunk" (if you will) of a LambdaExpression is the .Body; the parameters are necessary metadata about the structure of the tree (and what it needs), but .Parameters at the top (your dotted line) isn't really part of the tree's functional graph - it is only when those nodes are used later in the actual body of the tree that they are interesting, as value substitutions.

The ParameterExpression being visited twice is essential, so that it is possible for someone to swap the parameters if they wanted - for example, to build an entire new LambdaExpression with the same number of parameters, but different parameter instances (maybe changing the type).

The order will be fairly stable, but should be considered an implementation detail. For example, given a node such as Add(A,B), it should make no semantic difference whether I visit that A-first vs B-first.



回答2:

Just to add a bit to Marc's correct answer:

Are LINQ expression trees directed graphs without cycles?

First off, yes, an expression tree is a DAG -- a directed acyclic graph.

We know they are acyclic because expression trees are immutable, and therefore have to be built from the leaves up. In such a situation there is no way to make a cycle because all the nodes in the cycle would have to be allocated last, and clearly that's not going to happen.

Because the parts are immutable, the expression "tree" need not actually be a tree per se. As Marc points out, it is required that you re-use the reference for the parameter; that's how we determine when a declared parameter is used. It is somewhat strange, though legal, to re-use other parts too. For example, if you wanted to represent the expression tree for the body of (int x)=>(x + 1) * (x + 1), you could make an expression tree for (x + 1) and then make a multiplication node where both children were that expression tree.

When using ExpressionVisitor to visit the LambdaExpression, the ParameterExpression is visited twice. Is there a way to use the ExpressionVisitor to visit the LambdaExpression so that all the nodes are visited exactly once, and in a specific, well-known order (pre-order, in-order, post-order etc.)?

ExpressionVisitor is an abstract class. You can make your own concrete version of it that has the semantics you like. For example, you can override the Visit method such that it maintains a HashSet of nodes already seen, and does not call Accept on nodes that it has previously accepted.