Again this sample is a very simplified version of my actual problem involving a custom comparer for linq grouping. What have I done wrong?
The code below produces the result below (1.2, 0), (4.1, 0), (4.1, 0), (1.1, 0),
however I was expecting the following since 1.1 and 1.2 are < 1.0 apart. (1.2, 0), (1.1, 0), (4.1, 0), (4.1, 0),
class Program
{
static void Main(string[] args)
{
IEnumerable<Point> points = new List<Point> {
new Point(1.1, 0.0)
, new Point(4.1, 0.0)
, new Point(1.2, 0.0)
, new Point(4.1, 0.0)
};
foreach (var group in points.GroupBy(p => p, new PointComparer()))
{
foreach (var num in group)
Console.Write(num.ToString() + ", ");
Console.WriteLine();
}
Console.ReadLine();
}
}
class PointComparer : IEqualityComparer<Point>
{
public bool Equals(Point a, Point b)
{
return Math.Abs(a.X - b.X) < 1.0;
}
public int GetHashCode(Point point)
{
return point.X.GetHashCode()
^ point.Y.GetHashCode();
}
}
class Point
{
public double X;
public double Y;
public Point(double p1, double p2)
{
X = p1;
Y = p2;
}
public override string ToString()
{
return "(" + X + ", " + Y + ")";
}
}
The grouping algorithm (and I think all LINQ methods) using an equality comparer always first compares hash codes and only executes
Equals
if two hash codes are equal. You can see that if you add tracing statements in the equality comparer:Which results in:
Only for the two points with equal hash codes
Equals
was executed.Now you could trick the comparison by always returning
0
as hash code. If you do that, the output will be:Now for each pair
Equals
was executed, and you've got your grouping.But...
What is "equal"? If you add another point
(2.1, 0.0)
, which points do you want in one group? Using the symbol≈
for the fuzzy equality, we have -but
This means that
1.1
and2.1
will never be in one group (theirEquals
never passes) and that it depends on the order of the points whether1.1
or2.1
are grouped with1.2
.So you're on a slippery slope here. Clustering points by proximity is far from trivial. You're entering the realm of cluster analysis.
Don't forget the effects of
GetHashCode
. There is an expectation thatGetHashCode
will always return the same value for any two objects for eachEquals
would return true. If you fail to meet that expectation, you'll get unexpected results.Specifically,
GroupBy
probably uses something like a hash table to allow it to group items together without having to compare every item with every other item. IfGetHashCode
returns a value that doesn't end up putting two objects into the same bucket of the hash table, it'll assume that they're not equal, and will never try to callEquals
on them.You will find, as you try to figure out a correct implementation for
GetHashCode
that there's a fundamental problem with how you're trying to group your objects. What would you expect if you had points with x-values of1.0
,1.6
, and2.2
?1.0
and2.2
are too far from one another to fall into the same group, but1.6
is close enough to both other points that it should be in the same group with them. So yourEquals
method breaks the Transitive property of equality:If you're trying to do cluster grouping, you're going to need to use a more different data structure and algorithm. If you're just trying to normalize the points' locations somewhat, you might be able to just say
points.GroupBy(p => (int)p.X)
and avoid the equality comparer entirely.