I found about Cuckoo Hash tables and they seem pretty good.
But most examples code I found implement this using 2 tables.
This seems to me wrong because the 2 tables may be in different memory pages and we have overhead of fetching random addresses and have no real locality.
Is it not possible to use 1 array instead of 2?
Is it perhaps not possible to detect when an element has already been kicked out 2 times and is time for resizing?
相关问题
- facebook error invalid key hash for some devices
- Finding k smallest elements in a min heap - worst-
- binary search tree path list
- High cost encryption but less cost decryption
- C++ a class with an array of structs, without know
相关文章
- Optimization techniques for backtracking regex imp
- What are the problems associated to Best First Sea
- Coin change DP solution to keep track of coins
- Bcrypt vs Hash in laravel
- Algorithm for partially filling a polygonal mesh
- Robust polygon normal calculation
- Algorithm for maximizing coverage of rectangular a
- Logging Django SQL queries with DEBUG set to False
Yes.
http://www.spoj.com/problems/CUCKOO/
You can check this problem on spoj,we need to do this problem using a single hash table and two hash functions.
To answer the confusion in the comments: no this is not language specific. If you're thinking about memory locality and want to ensure the two tables are close, then a single allocation is the way to go (however you allocate). In java this may look as follows:
If this performs better than accessing into two separate arrays is dependant on your JVM however: just think about the JIT being able to merge two small allocations into a single larger one on the fly - without you having to perform any index-magic.
You can definitely do a cuckoo hashtable with a single hash table; that is, where the two positions for each object are simply positions within a single hash table.
The only small problem to be solved is how to decide during the cuckoo eviction loop which of the two positions to use for an evicted key. Of course, you can just try one position and use the other one if the first one was the same as the actual position. It should be possible to use SIMD to compute both hashes in parallel, so the cost of this strategy might be small.
However, if you want to guarantee a single hash computation during the cuckoo loop, there is a simple solution: instead of using
H0(k)
andH1(k)
as the two positions, useH0(k)
andH0(k) xor H1(k)
. (IfH1
is independent fromH0
, then so isH0 xor H1
, so the xor does not affect the distribution of hash values.) With this modification, you can always find "the other position" ofk
by xor'ing the current position withH1(k)
, so only a single hash computation is needed in the loop.While that allows you to use a single hash table, and may even simplify the code, there is not a lot of evidence that it improves the operation of the algorithm. In my limited testing, it seems to increase the number of loop iterations by 40-50%. (Although it needs to be emphasized that in the vast majority of cases, a new key can be inserted into the table without entering the loop at all, so the increased number of loops is hardly noticeable in the actual execution time.)
Well, all forms of hashing are murder on caches.
Anyways you can easily combine the two into a single table. But then how do you tell whether you're on your first hash function or the second? The options are add that as metadata to every bucket, or else figure it out by running the first hash function, seeing whether you got the current location, and running the second only if you were on the first. That either requires extra space, or running more hash functions.
Splitting the table into 2 solves that problem more efficiently. And statistically you need the same number of buckets to store the same number of things whether or not the table has been split. So your whole hash table becomes smaller.