GroupBy on complex object (e.g. List)

2020-02-01 06:50发布

Using GroupBy() and Count() > 1 I'm trying to find duplicate instances of my class in a list.

The class looks like this:

public class SampleObject
{
    public string Id;
    public IEnumerable<string> Events;
}

And this is how I instantiate and group the list:

public class Program
{
    private static void Main(string[] args)
    {
        var items = new List<SampleObject>()
        {
            new SampleObject() { Id = "Id", Events = new List<string>() { "ExampleEvent" } },
            new SampleObject() { Id = "Id", Events = new List<string>() { "ExampleEvent" } }
        };

        var duplicates = items.GroupBy(x => new { Token = x.Id, x.Events })
                         .Where(g => g.Count() > 1)
                         .Select(g => g.Key)
                         .ToList();
    }
}

The duplicates contains no items. How can I make the grouping work?

3条回答
forever°为你锁心
2楼-- · 2020-02-01 07:28

GroupBy() will perform a default comparison, causing it to find your lists not equal.

See the following code:

var eventList1 = new List<string>() { "ExampleEvent" };
var eventList2 = new List<string>() { "ExampleEvent" };

Console.WriteLine(eventList1.GetHashCode());
Console.WriteLine(eventList2.GetHashCode());
Console.WriteLine(eventList1.Equals(eventList2));

Two "equal" lists, right? However, this will print:

796641852
1064243573
False

So they're not considered equal, hence not grouped.

You need to provide a custom comparer, that will compare the relevant properties of the objects. Note that as shown before, List<T>.GetHashCode() does not properly represent the items in the list.

You can do that as such (from Good GetHashCode() override for List of Foo objects respecting the order and LINQ GroupBy on multiple ref-type fields; Custom EqualityComparer):

public class SampleObjectComparer : IEqualityComparer<SampleObject>
{
    public bool Equals(SampleObject a, SampleObject b)
    {
        return a.Id == b.Id 
            && a.Events.SequenceEqual(b.Events);
    }

    public int GetHashCode(SampleObject a)
    {
        int hash = 17;

        hash = hash * 23 + a.Id.GetHashCode();

        foreach (var evt in a.Events)
        {
            hash = hash * 31 + evt.GetHashCode();
        }           

        return hash;
    }
}

And use it like this:

var eventList1 = new List<string>() { "ExampleEvent" };
var eventList2 = new List<string>() { "ExampleEvent" };

var items = new List<SampleObject>()
{
    new SampleObject() { Id = "Id", Events = eventList1 },
    new SampleObject() { Id = "Id", Events = eventList2 }
};

var duplicates = items.GroupBy(x => x, new SampleObjectComparer())
                 .Where(g => g.Count() > 1)
                 .Select(g => g.Key)
                 .ToList();

Console.WriteLine(duplicates.Count);
查看更多
叼着烟拽天下
3楼-- · 2020-02-01 07:33

To get objects to work with many of LINQ's operators, such as GroupBy or Distinct, you must either implement GetHashCode & Equals, or you must provide a custom comparer.

In your case, with a property as a list you probably need a comparer, unless you made the list read only.

Try this comparer:

public class SampleObjectComparer : IEqualityComparer<SampleObject>
{
    public bool Equals(SampleObject x, SampleObject y)
    {
        return x.Id == y.Id && x.Events.SequenceEqual(y.Events);
    }

    public int GetHashCode(SampleObject x)
    {
        return x.Id.GetHashCode() ^ x.Events.Aggregate(0, (a, y) => a ^ y.GetHashCode());
    }
}

Now this code works:

    var items = new List<SampleObject>()
    {
        new SampleObject() { Id = "Id", Events = new List<string>() { "ExampleEvent"} },
        new SampleObject() { Id = "Id", Events = new List<string>() { "ExampleEvent" } }
    };

    var comparer = new SampleObjectComparer();

    var duplicates = items.GroupBy(x => x, comparer)
                     .Where(g => g.Count() > 1)
                     .Select(g => g.Key)
                     .ToList();
查看更多
你好瞎i
4楼-- · 2020-02-01 07:47

List<T> has no overridden Equals + GetHashCode, that's why your GroupBy doesn't work as expected. One of the two properties of the anonymous type refer to the list, when the GroupBy has to compare two lists Object.RefernceEquals is used which only checks if both are the same reference and not if both contain the sample elements.

You could provide a custom IEqualityComparer<T>:

public class IdEventComparer : IEqualityComparer<SampleObject>
{
    public bool Equals(SampleObject x, SampleObject y)
    {
        if (object.ReferenceEquals(x, y)) 
            return true;
        if (x == null || y == null) 
            return false;
        if(x.Id != y.Id) 
            return false;
        if (x.Events == null && y.Events == null)
            return true;
        if (x.Events == null || y.Events == null)
            return false;

        return x.Events.SequenceEqual(y.Events);
    }

    public int GetHashCode(SampleObject obj)
    {
        if(obj == null) return 23;
        unchecked
        {
            int hash = 23;
            hash = (hash * 31) + obj.Id == null ? 31 : obj.Id.GetHashCode();

            if (obj.Events == null) return hash;
            foreach (string item in obj.Events)
            {
                hash = (hash * 31) + (item == null ? 0 : item.GetHashCode());
            }
            return hash;
        }
    }
}

Then you can use it in many LINQ methods like also GroupBy:

var duplicates = items.GroupBy(x => x, new IdEventComparer())
     .Where(g => g.Count() > 1)
     .Select(g => g.Key)
     .ToList();
查看更多
登录 后发表回答