Using GroupBy()
and Count() > 1
I'm trying to find duplicate instances of my class in a list.
The class looks like this:
public class SampleObject
{
public string Id;
public IEnumerable<string> Events;
}
And this is how I instantiate and group the list:
public class Program
{
private static void Main(string[] args)
{
var items = new List<SampleObject>()
{
new SampleObject() { Id = "Id", Events = new List<string>() { "ExampleEvent" } },
new SampleObject() { Id = "Id", Events = new List<string>() { "ExampleEvent" } }
};
var duplicates = items.GroupBy(x => new { Token = x.Id, x.Events })
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
}
}
The duplicates
contains no items. How can I make the grouping work?
To get objects to work with many of LINQ's operators, such as GroupBy
or Distinct
, you must either implement GetHashCode
& Equals
, or you must provide a custom comparer.
In your case, with a property as a list you probably need a comparer, unless you made the list read only.
Try this comparer:
public class SampleObjectComparer : IEqualityComparer<SampleObject>
{
public bool Equals(SampleObject x, SampleObject y)
{
return x.Id == y.Id && x.Events.SequenceEqual(y.Events);
}
public int GetHashCode(SampleObject x)
{
return x.Id.GetHashCode() ^ x.Events.Aggregate(0, (a, y) => a ^ y.GetHashCode());
}
}
Now this code works:
var items = new List<SampleObject>()
{
new SampleObject() { Id = "Id", Events = new List<string>() { "ExampleEvent"} },
new SampleObject() { Id = "Id", Events = new List<string>() { "ExampleEvent" } }
};
var comparer = new SampleObjectComparer();
var duplicates = items.GroupBy(x => x, comparer)
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
List<T>
has no overridden Equals
+ GetHashCode
, that's why your GroupBy
doesn't work as expected. One of the two properties of the anonymous type refer to the list, when the GroupBy
has to compare two lists Object.RefernceEquals
is used which only checks if both are the same reference and not if both contain the sample elements.
You could provide a custom IEqualityComparer<T>
:
public class IdEventComparer : IEqualityComparer<SampleObject>
{
public bool Equals(SampleObject x, SampleObject y)
{
if (object.ReferenceEquals(x, y))
return true;
if (x == null || y == null)
return false;
if(x.Id != y.Id)
return false;
if (x.Events == null && y.Events == null)
return true;
if (x.Events == null || y.Events == null)
return false;
return x.Events.SequenceEqual(y.Events);
}
public int GetHashCode(SampleObject obj)
{
if(obj == null) return 23;
unchecked
{
int hash = 23;
hash = (hash * 31) + obj.Id == null ? 31 : obj.Id.GetHashCode();
if (obj.Events == null) return hash;
foreach (string item in obj.Events)
{
hash = (hash * 31) + (item == null ? 0 : item.GetHashCode());
}
return hash;
}
}
}
Then you can use it in many LINQ methods like also GroupBy
:
var duplicates = items.GroupBy(x => x, new IdEventComparer())
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
GroupBy()
will perform a default comparison, causing it to find your lists not equal.
See the following code:
var eventList1 = new List<string>() { "ExampleEvent" };
var eventList2 = new List<string>() { "ExampleEvent" };
Console.WriteLine(eventList1.GetHashCode());
Console.WriteLine(eventList2.GetHashCode());
Console.WriteLine(eventList1.Equals(eventList2));
Two "equal" lists, right? However, this will print:
796641852
1064243573
False
So they're not considered equal, hence not grouped.
You need to provide a custom comparer, that will compare the relevant properties of the objects. Note that as shown before, List<T>.GetHashCode()
does not properly represent the items in the list.
You can do that as such (from Good GetHashCode() override for List of Foo objects respecting the order and LINQ GroupBy on multiple ref-type fields; Custom EqualityComparer):
public class SampleObjectComparer : IEqualityComparer<SampleObject>
{
public bool Equals(SampleObject a, SampleObject b)
{
return a.Id == b.Id
&& a.Events.SequenceEqual(b.Events);
}
public int GetHashCode(SampleObject a)
{
int hash = 17;
hash = hash * 23 + a.Id.GetHashCode();
foreach (var evt in a.Events)
{
hash = hash * 31 + evt.GetHashCode();
}
return hash;
}
}
And use it like this:
var eventList1 = new List<string>() { "ExampleEvent" };
var eventList2 = new List<string>() { "ExampleEvent" };
var items = new List<SampleObject>()
{
new SampleObject() { Id = "Id", Events = eventList1 },
new SampleObject() { Id = "Id", Events = eventList2 }
};
var duplicates = items.GroupBy(x => x, new SampleObjectComparer())
.Where(g => g.Count() > 1)
.Select(g => g.Key)
.ToList();
Console.WriteLine(duplicates.Count);