using expression trees to compare objects by a sin

2019-05-31 17:07发布

问题:

I am trying to use Expression Trees because based on description, that seems to be the most correct (performant, configurable) approach.

I expect to be able to craft a statement that gets the first item from the existingItems collection that matches the propertyNameToCompareOn value of the incomingItem.

I have a method with the following signature and simulated code body...

DetectDifferences<T>(List<T> incomingItems, List<T> existingItems)
{
  var propertyNameToCompareOn = GetThisValueFromAConfigFile(T.FullName());

  //does this belong outside of the loop?
  var leftParam = Expression.Parameter(typeof(T), "left");
  var leftProperty = Expression.Property(leftParam, identField);
  var rightParam = Expression.Parameter(typeof(T), "right");
  var rightProperty = Expression.Property(rightParam, identField);
  //this throws the error
  var condition = Expression.Lambda<Func<T, bool>>(Expression.Equal(leftProperty, rightProperty));

  foreach (var incomingItem in incomingItems) //could be a parallel or something else. 
  {
     // also, where am I supposed to provide incomingItem to this statement?
     var existingItem = existingItems.FirstOrDefault(expression/condition/idk);

     // the statement for Foo would be something like
     var existingFoos = exsistingItems.FirstOrDefault(f => f.Bar.Equals(incomingItem.Bar);

     //if item does not exist, consider it new for persistence
     //if item does exist, compare a configured list of the remaining properties between the 
     // objects. If they are all the same, report no changes. If any
     // important property is different, capture the differences for 
     // persistence. (This is where precalculating hashes seems like the 
     // wrong approach due to expense.)
  }
}

At the marked line above, I get an "Incorrect number of parameters supplied for lambda declaration" InvalidOperationException. At this point I am just hacking crap together from the web and I really dont know what this wants. There are a bunch of overloads that VS can full my screen with, and none of the examples make sense from the articles on MSDN/SO.

PS - I dont really want an IComparer or similar implementation if it can be helped. I can do that with reflection. I do need to make this as rapid as possible, but allow it to be called for multiple types, hence the choice of expression trees.

回答1:

Here's a method to make a property access expression;

    public static Expression<Func<T, object>> MakeLambda<T>(string propertyName)
    {
        var param = Expression.Parameter(typeof(T));
        var propertyInfo = typeof(T).GetProperty(propertyName);
        var expr = Expression.MakeMemberAccess(param, propertyInfo);
        var lambda = Expression.Lambda<Func<T, object>>(expr, param);
        return lambda;
    } 

which you can use like this;

 var accessor = MakeLambda<Foo>("Name").Compile();
 accessor(myFooInstance); // returns name

Making your missing line

 var existingItem = existingItems.FirstOrDefault(e => accessor(e) == accessor(incomingItem));

Be aware the == only works well for value types like ints; careful of comparing objects.

Here's proof the lambda approach is much faster;

    static void Main(string[] args)
    {
        var l1 = new List<Foo> { };
        for(var i = 0; i < 10000000; i++)
        {
            l1.Add(new Foo { Name = "x" + i.ToString() });
        }

        var propertyName = nameof(Foo.Name);
        var lambda = MakeLambda<Foo>(propertyName);
        var f = lambda.Compile();

        var propertyInfo = typeof(Foo).GetProperty(nameof(Foo.Name));

        var sw1 = Stopwatch.StartNew();
        foreach (var item in l1)
        {
            var value = f(item);
        }
        sw1.Stop();


        var sw2 = Stopwatch.StartNew();
        foreach (var item in l1)
        {
            var value = propertyInfo.GetValue(item);
        }
        sw2.Stop();

        Console.WriteLine($"{sw1.ElapsedMilliseconds} vs {sw2.ElapsedMilliseconds}");



    }

As someone's also pointed out, though, the double-loop in the OP is O(N^2) and that should probably be the next consideration if efficiency is the driver here.



回答2:

When working with expression trees, it's important to first understand, in real code, what you want to do.

I always begin by first writing out (in static code) what the resulting expression looks like with real C# lambda syntax.

Based on your description, your stated goal is that you should be able to (dynamically) look up some property of the type T that gives some sort of quick comparison. How would you write this if both T and TProperty were both known at compile time? I suspect it would look something like this:

Func<Foo, Foo, bool> comparer = (Foo first, Foo second) => 
    first.FooProperty == second.FooProperty;

Right away we can see that your Expression is wrong. You don't need one input T, you need two!

It should also be obvious why you're getting the InvalidOperationException as well. You never supplied any parameters to your lambda expression, only the body. Above, 'first' and 'second' are the parameters provided to the lambda. You'll need to provide them to the Expression.Lambda()call as well.

var condition = Expression.Lambda<Func<T,T, bool>>(
    Expression.Equal(leftProperty, rightProperty),
    leftParam,
    rightParam);

This simply uses the Expression.Lambda(Expression, ParameterExpression[]) overload for Expression.Lambda. Each ParameterExpression is the parameter that is used in the body. That's it. Don't forget to .Compile() your expression into a delegate if you want to actually invoke it.

Of course this doesn't mean that your technique will be necessarily fast. If you're using fancy expression trees to compare two lists with a naive O(n^2) approach, it won't matter.