Assuming that we have a set of elements and want to store them in a hash map (for example std::unoredered_set
), and each element has a key of type uint64_t
which value can vary from 0 to its max possible value, is it the best choice to use trivial hash function, where a hash value of a key is a key itself? Does it depend on container in use (i.e. Google's sparse hash vs unordered map from STL)? The probability of appearance of key values is unknown.
问题:
回答1:
If all you have to hash is a uint64_t of any possible value with unknown probabilities, and your output must be a uint64_t, then you don't gain any advantage by changing the value. Just use the key itself.
If you knew something about the distribution of your values or your values were restricted to a smaller range (which is really the same thing as knowing about the distribution), then it could be beneficial to apply a transformation to the key, but this depends on the implementation of the container. You would only benefit by reducing collisions when the table transforms a hash into a bucket index, but that depends both on the table's algorithm and the current/average state of the table (how often each bucket is used).
回答2:
I would suggest a good 64-bit mixer of which there are many to choose from. The finalizer from MurmerHash3 is fairly quick and does a reasonable job in just five lines of code:
key ^= key >> 33;
key *= 0xff51afd7ed558ccd;
key ^= key >> 33;
key *= 0xc4ceb9fe1a85ec53;
key ^= key >> 33;
Numerical Recipes, 3rd Edition, recommends this:
public static UInt64 Next( UInt64 u )
{
UInt64 v = u * 3935559000370003845 + 2691343689449507681;
v ^= v >> 21;
v ^= v << 37;
v ^= v >> 4;
v *= 4768777513237032717;
v ^= v << 20;
v ^= v >> 41;
v ^= v << 5;
return v;
}
回答3:
HashMaps are very useful in giving you fast access to data. The complexity of hash operations are O(1)
which are very fast compared to other well-known structures such as double linked lists, linked lists or trees.
To use uint64_t
as a key for the hash, you can use hash containers such as GHASHLISH
The GLIB
library is thread-safe and used by multiple open source projects. It does not support uint64_t
as a key natively, so you have to provide your own hash_function.
As an example, you can use FNV hash
Here's a quick example of how to hash uint64
to uint32
using FNV:
#define FNV_offset_basis 2166136261
#define FNV_prime 16777619
guint c_uint64_t_hash(gpointer data)
{
uint8_t* v =(uint8_t*)data;
guint hash = FNV_offset_basis;
for(int i=0;i<8;i++)
{
hash = hash ^ v[i];
hash = hash * FNV_prime;
}
return hash;
}