I'm basing this on performance characteristics I've recently found out about Dictionary
, so I'm using Dictionary<type, bool>
where the bool
is ignored but supposedly I could use HashSet
instead.
For example:
Dictionary<bounds, bool> overlap;
class bounds
{
public float top_left_x, top_left_y, width, height;
public bool equal(bounds other)
{
return upper_left_x + width > other.upper_left_x &&
upper_left_x < other.upper_left_x + other.width &&
upper_left_y + height > other.upper_left_y &&
upper_left_y < other.upper_left_y + other.height;
}
public ... GetHashCode()
{
...;
}
}
Here I'm not using equal to check for equality but instead for overlapping, which is bound to be annoying elsewhere but there is a reason why I'm doing it.
I'm presuming that if a value can be looked up from a key in O(1) time then so can a key from itself.
So I could presumably put thousands of bounds into overlap and do this:
overlap.ContainsKey(new bounds(...));
To find out in O(1) time if a given bound overlaps any others from the collection.
I'd also like to know what happens if I change the (x, y) position of a bound, presumably it's like removing and then adding it into the set again performance wise, very expensive?
What do I put into the GetHashCode function?
goal
If this works then I'm after using this sort of mechanism to find out what other bounds a given bound overlaps.
Very few bounds move in this system and no new ones are added after the collection has been populated. Newly added bounds need to be able to overlap old ones.
conclusion
See the feedback below for more details.
In summary it's not possible to achieve the O(1) performance because, unlike the default equals, a check for overlapping is not transitive.
An interval tree however is a good solution.
I'm assuming this means you will have a scenario where A.Equals(B) is true, B.Equals(C) is true, but A.Equals(C) is false. In other words, your Equals is not transitive.
That is breaking the rules of Equals(), and as a result Dictionary will not work for you. The rule of Equals/GetHashCode is (from http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx):
If two objects compare as equal, the GetHashCode method for each object must return the same value.
If your Equals is not transitive, then you can't possibly write a valid GetHashCode.
You can not guarantee
O(1)
performance on dictionary where you customizehashcode calculation
. If I put insideGetHashCode()
method some WebService request which should control for me the equality of 2 provided items, it's clear that the time can never beO(1)
as expected. Ok, this is kind of "edge case", but just to give an idea.By doing in a way you think to do (assuming that this is even possible), imo, you negate the benefits provided by
Dictionary<K,V>
, so the constant key recovery time also on big collections.It need to be measured on reasonable amount of objects you have, but I would first try to use
List<T>
like an object holder, and make something like this:You can't use a
Dictionary
orHashSet
to check if bounds overlap. To be able to use a dictionary (or hashset), you need anEquals()
and aGetHashCode()
method that meet the following properties:Equals()
method is an equivalence relationa.Equals(b)
must implya.GetHashCode() == b.GetHashCode()
You can't meet either of these requirements, so you have to use another datastructure: An Interval tree.
If you use the derived class approach I mentioned above, you need the following:
but since the GetHashCode() method needs to return always the same value, the runtime complexity will most likely be at O(n) instead of O(1).
The equality relation is completely the wrong relation to use here because equality is required to be an equivalence relation. That is, it must be reflexive -- A == A for any A. It must be symmetric -- A == B implies that B == A. And it must be transitive -- if A == B and B == C then A == C.
You are proposing a violation of the transitive property; "overlaps" is not a transitive relation, therefore "overlaps" is not an equivalence relation, and therefore you cannot define equality as overlapping.
Rather than trying to do this dangerous thing, solve the real problem. Your goal is apparently to take a set of intervals, and then quickly determine whether a given interval overlaps any of those intervals. The data structure you want is called an interval tree; it is specifically optimized to solve exactly that problem, so use it. Under no circumstances should you attempt to use a hash set as an interval tree. Use the right tool for the job:
http://wikipedia.org/wiki/Interval_tree