Can we implement cuckoo hashing using 1 table?

I found about Cuckoo Hash tables and they seem pretty good.
But most examples code I found implement this using 2 tables.
This seems to me wrong because the 2 tables may be in different memory pages and we have overhead of fetching random addresses and have no real locality.
Is it not possible to use 1 array instead of 2?
Is it perhaps not possible to detect when an element has already been kicked out 2 times and is time for resizing?

标签： algorithm optimization data-structures hash hashtable

4条回答

萌系小妹纸

2楼-- · 2019-07-11 06:30

Yes.

http://www.spoj.com/problems/CUCKOO/

You can check this problem on spoj,we need to do this problem using a single hash table and two hash functions.

0人赞添加讨论(0) 举报

唯我独甜

3楼-- · 2019-07-11 06:42

To answer the confusion in the comments: no this is not language specific. If you're thinking about memory locality and want to ensure the two tables are close, then a single allocation is the way to go (however you allocate). In java this may look as follows:

class TwoTables {
    private static final int SIZE_TABLE_FIRST = 11, SIZE_TABLE_SECOND = 29;

    public TwoTables() {
        m_buffer = new int[SIZE_TABLE_FIRST + SIZE_TABLE_SECOND];
    }

    // consider similar setters...

    public int getFirst(int key) {
        return m_buffer[toIndex(hashFirst(key), SIZE_TABLE_FIRST, 0)];
    }

    public int getSecond(int key) {
        return m_buffer[toIndex(hashSecond(key), SIZE_TABLE_SECOND, SIZE_TABLE_FIRST)];
    }

    private static int toIndex(int hash, int mod, int offset) {
        return hash % mod + offset;
    }

    private static int hashFirst(int key) { return ...; }
    private static int hashSecond(int key) { return ...; }

    private final int[] m_buffer;
}

If this performs better than accessing into two separate arrays is dependant on your JVM however: just think about the JIT being able to merge two small allocations into a single larger one on the fly - without you having to perform any index-magic.

0人赞添加讨论(0) 举报

姐就是有狂的资本

4楼-- · 2019-07-11 06:44

You can definitely do a cuckoo hashtable with a single hash table; that is, where the two positions for each object are simply positions within a single hash table.

The only small problem to be solved is how to decide during the cuckoo eviction loop which of the two positions to use for an evicted key. Of course, you can just try one position and use the other one if the first one was the same as the actual position. It should be possible to use SIMD to compute both hashes in parallel, so the cost of this strategy might be small.

However, if you want to guarantee a single hash computation during the cuckoo loop, there is a simple solution: instead of using H₀(k) and H₁(k) as the two positions, use H₀(k) and H₀(k) xor H₁(k). (If H₁ is independent from H₀, then so is H₀ xor H₁, so the xor does not affect the distribution of hash values.) With this modification, you can always find "the other position" of k by xor'ing the current position with H₁(k), so only a single hash computation is needed in the loop.

While that allows you to use a single hash table, and may even simplify the code, there is not a lot of evidence that it improves the operation of the algorithm. In my limited testing, it seems to increase the number of loop iterations by 40-50%. (Although it needs to be emphasized that in the vast majority of cases, a new key can be inserted into the table without entering the loop at all, so the increased number of loops is hardly noticeable in the actual execution time.)

0人赞添加讨论(0) 举报

在下西门庆

5楼-- · 2019-07-11 06:44

Well, all forms of hashing are murder on caches.

Anyways you can easily combine the two into a single table. But then how do you tell whether you're on your first hash function or the second? The options are add that as metadata to every bucket, or else figure it out by running the first hash function, seeing whether you got the current location, and running the second only if you were on the first. That either requires extra space, or running more hash functions.

Splitting the table into 2 solves that problem more efficiently. And statistically you need the same number of buckets to store the same number of things whether or not the table has been split. So your whole hash table becomes smaller.

0人赞添加讨论(0) 举报

Can we implement cuckoo hashing using 1 table?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间