Ensuring One Value per Hashmap bucket/slot

2019-02-21 00:59发布

问题:

Is there a way to strictly ensure the number of entries per Hashmap bucket without tampering the the object.hashcode() function in Java?

The Load Factor is an average: (# of entries) / (# of buckets). In essence, let's say I have a Hashmap of capacity 1000. For the sake of this example, say I use a Load Factor of 1. The 100 objects I'm going to be storing in the HashMap have bad hashcode function which always return the same value for every object. When I'm done storing 100 objects, they will all map of the same HashMap bucket and I eventually end up with LinkedList performance. The Load Factor will sit silent because 100 entries / 1000 buckets = 0.1 < 1. Now what happens if I put 1 M of the same objects. The HashMap will never be resized (no use anyways) as the LF will never be triggered.

I know this is an uncommon scenario in real world but would like to improve my understanding. Is there a way in HashMap to prevent this or at least get some warning from the structure itself?

回答1:

A HashMap will always calculate which bucket to use based on the key's hash code. If each key has the same hash code, they will all map to the same bucket. You cannot prevent the behavior you described without providing a better hashCode() implementation.

You could look at Map implementations that use open addressing (e.g. Trove's THashMap). They will always have just one entry per bucket. But the performance will not improve, they just deal with collisions in a different way, and they also won't solve your root problem : a bad hash code.



回答2:

Writing a perfect HashFunction is the only way to achieve what you are looking for.

Given a small, privileged set of inputs the permutation table can be adjusted so that those inputs yield distinct hash values, producing what is called a perfect hash function.

check out Pearson's Hashing