How do I decide when should I do rehashing of entire hash table?
相关问题
- facebook error invalid key hash for some devices
- Finding k smallest elements in a min heap - worst-
- binary search tree path list
- High cost encryption but less cost decryption
- How to get a fixed number of evenly spaced points
相关文章
- What are the problems associated to Best First Sea
- Coin change DP solution to keep track of coins
- Bcrypt vs Hash in laravel
- Algorithm for partially filling a polygonal mesh
- Robust polygon normal calculation
- Algorithm for maximizing coverage of rectangular a
- How to measure complexity of a string?
- Select unique/deduplication in SSE/AVX
A rule of thumb is to resize the table once it's 3/4 full.
Generally, you have a hash table containing N elements distributed in an array of M slots.
There is a percent value (called "growthFactor") defined by the user when instantiating the hash table that is used in this way:
the rehash means your array of M slots should be resized to contain more elements (a prime number greater than the current size (or 2x greater) is ideal) and that your elements must be distributed in the new larger array.
Such value should set to something between 0.6 and 0.8.
This depends a great deal on how you're resolving collisions. If you user linear probing, performance usually starts to drop pretty badly with a load factor much higher than 60% or so. If you use double hashing, a load factor of 80-85% is usually pretty reasonable. If you use collision chaining, performance usually remains reasonable with load factors up to around 150% or or more.
I've sometimes even created a hash table with balanced trees for collision resolution. In this case, you can almost forget about re-hashing -- the performance doesn't start to deteriorate noticeably until the number of items exceeds the table size by at least a couple orders of magnitude.