When should I do rehashing of entire hash table?

2019-04-12 10:36发布

问题:

How do I decide when should I do rehashing of entire hash table?

回答1:

This depends a great deal on how you're resolving collisions. If you user linear probing, performance usually starts to drop pretty badly with a load factor much higher than 60% or so. If you use double hashing, a load factor of 80-85% is usually pretty reasonable. If you use collision chaining, performance usually remains reasonable with load factors up to around 150% or or more.

I've sometimes even created a hash table with balanced trees for collision resolution. In this case, you can almost forget about re-hashing -- the performance doesn't start to deteriorate noticeably until the number of items exceeds the table size by at least a couple orders of magnitude.



回答2:

Generally, you have a hash table containing N elements distributed in an array of M slots.

There is a percent value (called "growthFactor") defined by the user when instantiating the hash table that is used in this way:

if (growthRatio < (N/M))
  Rehash();

the rehash means your array of M slots should be resized to contain more elements (a prime number greater than the current size (or 2x greater) is ideal) and that your elements must be distributed in the new larger array.

Such value should set to something between 0.6 and 0.8.



回答3:

A rule of thumb is to resize the table once it's 3/4 full.