Using C# HashSet to solve problems where equal is

2019-06-04 13:04发布

问题:

I'm basing this on performance characteristics I've recently found out about Dictionary, so I'm using Dictionary<type, bool> where the bool is ignored but supposedly I could use HashSet instead.

For example:

Dictionary<bounds, bool> overlap;

class bounds
{
    public float top_left_x, top_left_y, width, height;

    public bool equal(bounds other)
    {
        return upper_left_x + width > other.upper_left_x &&
        upper_left_x < other.upper_left_x + other.width &&
        upper_left_y + height > other.upper_left_y &&
        upper_left_y < other.upper_left_y + other.height;
    }

    public ... GetHashCode()
    {
        ...;
    }
}

Here I'm not using equal to check for equality but instead for overlapping, which is bound to be annoying elsewhere but there is a reason why I'm doing it.

I'm presuming that if a value can be looked up from a key in O(1) time then so can a key from itself.

So I could presumably put thousands of bounds into overlap and do this:

overlap.ContainsKey(new bounds(...));

To find out in O(1) time if a given bound overlaps any others from the collection.

I'd also like to know what happens if I change the (x, y) position of a bound, presumably it's like removing and then adding it into the set again performance wise, very expensive?

What do I put into the GetHashCode function?

goal

If this works then I'm after using this sort of mechanism to find out what other bounds a given bound overlaps.

Very few bounds move in this system and no new ones are added after the collection has been populated. Newly added bounds need to be able to overlap old ones.

conclusion

See the feedback below for more details.

In summary it's not possible to achieve the O(1) performance because, unlike the default equals, a check for overlapping is not transitive.

An interval tree however is a good solution.

回答1:

Here I'm not using equal to check for equality but instead for overlapping, which is bound to be annoying elsewhere but there is a reason why I'm doing it.

I'm assuming this means you will have a scenario where A.Equals(B) is true, B.Equals(C) is true, but A.Equals(C) is false. In other words, your Equals is not transitive.

That is breaking the rules of Equals(), and as a result Dictionary will not work for you. The rule of Equals/GetHashCode is (from http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx):

If two objects compare as equal, the GetHashCode method for each object must return the same value.

If your Equals is not transitive, then you can't possibly write a valid GetHashCode.



回答2:

The equality relation is completely the wrong relation to use here because equality is required to be an equivalence relation. That is, it must be reflexive -- A == A for any A. It must be symmetric -- A == B implies that B == A. And it must be transitive -- if A == B and B == C then A == C.

You are proposing a violation of the transitive property; "overlaps" is not a transitive relation, therefore "overlaps" is not an equivalence relation, and therefore you cannot define equality as overlapping.

Rather than trying to do this dangerous thing, solve the real problem. Your goal is apparently to take a set of intervals, and then quickly determine whether a given interval overlaps any of those intervals. The data structure you want is called an interval tree; it is specifically optimized to solve exactly that problem, so use it. Under no circumstances should you attempt to use a hash set as an interval tree. Use the right tool for the job:

http://wikipedia.org/wiki/Interval_tree



回答3:

If you use the derived class approach I mentioned above, you need the following:

public class Bounds
{
    public Point position;
    public Point size; // I know the width and height don't really compose
                       // a point, but this is just for demonstration

    public override int GetHashCode(){...}
}

public class OverlappingBounds : Bounds
{
    public override bool Equals(object other)
    {
        // your implementation here
    }
}

// Usage:
if (_bounds.ContainsKey(new OverlappingBounds(...))){...}

but since the GetHashCode() method needs to return always the same value, the runtime complexity will most likely be at O(n) instead of O(1).



回答4:

You can't use a Dictionary or HashSet to check if bounds overlap. To be able to use a dictionary (or hashset), you need an Equals() and a GetHashCode() method that meet the following properties:

  1. The Equals() method is an equivalence relation
  2. a.Equals(b) must imply a.GetHashCode() == b.GetHashCode()

You can't meet either of these requirements, so you have to use another datastructure: An Interval tree.



回答5:

You can not guarantee O(1) performance on dictionary where you customize hashcode calculation. If I put inside GetHashCode() method some WebService request which should control for me the equality of 2 provided items, it's clear that the time can never be O(1) as expected. Ok, this is kind of "edge case", but just to give an idea.

By doing in a way you think to do (assuming that this is even possible), imo, you negate the benefits provided by Dictionary<K,V>, so the constant key recovery time also on big collections.

It need to be measured on reasonable amount of objects you have, but I would first try to use List<T> like an object holder, and make something like this:

var bounds = new List<Bound> {.... initialization... }
Bound providedBound = //something. Some data filled in it. 
var overlappedany = bounds.Any<Bound>(b=>return b.Equals(providedBound));