How to create a good hash_combine with 64 bit outp

2019-02-06 22:45发布

问题:

Currently Boost has hash_combine function that outputs 32 bit unsigned integer (to be precise, size_t). Some references:

http://www.boost.org/doc/libs/1_43_0/doc/html/hash/reference.html#boost.hash_combine

http://www.boost.org/doc/libs/1_43_0/doc/html/hash/combine.html

Magic number in boost::hash_combine

I'd like to explore on how to create 64 bit version of hash_combine.

The first thing is to get golden ratio or any other irrational number in 64 bit.

The second part is to use shifts. This part rather tricky and I'd like to ask if there are best practices or guide on using shifts to get hash values? Or choosing shifts like the original code:

seed ^= hash_value(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2); 

is totally random?

Also how to evaluate the output of hash_combine to make sure that it doesn't create more collisions than the original hash function hash_value?

回答1:

If you only want a hash_combine that hashes 2 64 bit values into one, and you don't need a new hash function for strings, you could just lift a tiny bit of code from CityHash, something like this (assuming size_t is a 64 bit unsigned integer, add your favorite bit of preprocessor or template trickery to validate that):

template <class T> inline void hash_combine(std::size_t& seed, const T& v)
{
    std::hash<T> hasher;
    const std::size_t kMul = 0x9ddfea08eb382d69ULL;
    std::size_t a = (hasher(v) ^ seed) * kMul;
    a ^= (a >> 47);
    std::size_t b = (seed ^ a) * kMul;
    b ^= (b >> 47);
    seed = b * kMul;
}

(I think reproducing this snippet here and elsewhere is OK because it doesn't constitute a 'substantial portion' of the CityHash code, but please check the CityHash sources & license agreement to decide for yourself)



回答2:

Read http://burtleburtle.net/bob/hash/doobs.html for some basic information on hash function design, and the rest of the articles in http://burtleburtle.net/bob/hash/ for more detailed information. CityHash was tested using http://code.google.com/p/smhasher/, and you can probably test your hash_combine using the same testsuite.

Although I'm not an expert in hashing, the designs of recent hash functions lead me to believe that the 2-shift technique boost's hash_combine() uses is no longer state-of-the-art and can be improved on.



标签: c++ boost hash