The Question
What is a good specialization of std::hash for use in the third template parameter of std::unordered_map or std::unordered_set for a user defined type for which all member data types already have a good specialization of std::hash?
For this question, I define "good" as simple to implement and understand, reasonably efficient, and unlikely to produce hash table collisions. The definition of good does not include any statements about security.
The State of What is Google'able
At the moment, two StackOverflow questions are the first hits for a Google search of "std hash specialization".
The first, How to specialize std::hash::operator() for user-defined type in unordered containers?, addresses whether it is legal to open the std namespace and add template specializations.
The second, How to specialize std::hash for type from other library, essentially addresses that same question.
This leaves the current question. Given that implementations of the C++ Standard Library defined hash functions for primitive types and types in the Standard Library, what is a simple and effective way of specializing std::hash for user defined types? Is there a good way to combine hash functions provided by a Standard Library implementation?
(Edit thanks to dyp.) Another question on StackOverflow addresses how to combine a pair of hash functions.
The other Google results are of no more help.
This Dr. Dobbs article states that XOR of two satisfactory hashes will produce a new satisfactory hash.
This articles seems to speak from knowledge and implies many things but is light on details. It contradicts the Dr. Dobbs article in a brief remark in the first example, saying that using XOR to combine hash functions makes for a weak resulting hash function.
Because XOR applied to any two equal values results in 0, I can see why XOR by itself is weak.
The Meta Question
A well reasoned answer explaining why this question is invalid and cannot be answered generally would also be welcome.
One easy way is to use
boost::hash
library and extend it for your type. It has a nice extension functionhash_combine
(std::hash
lacks that) that allows easy composition of hashes of individual data members of your structures.In other words:
boost::hash_value
for your own type.std::hash
for your own type and implement it usingboost::hash_value
.This way you get the best of std and boost worlds, both
std::hash<>
andboost::hash<>
work for your type.A better way is to use the proposed new hashing infrastructure in N3980 Types Don't Know #. This infrastructure makes
hash_combine
unnecessary.Until we get a library in the standard to help with this:
std::hash<YourType>
, create aSpookyHash
instance, andInit
it. Note that picking a random number at either process startup orstd::hash
construction, and using that as the initialization will make it slightly harder to DoS your program, but doesn't fix the problem.operator==
("salient field"), and feed it intoSpookyHash::Update
.double
: they have 2 representations aschar[]
that compare==
:-0.0
and0.0
. Also beware types that have padding. On most machines,int
doesn't, but it's hard to tell if astruct
will. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html#is_contiguously_hashable discusses this.SpookyHash
instance. However, this requires adding a method to those structures or manually extracting the salient fields: if you can't do this, it's acceptable to just feed theirstd::hash<>
value into the top-levelSpookyHash
instance.SpookyHash::Final
fromstd::hash<YourType>
.Your operation is required to
size_t
==
operator.There is no explicit requirement that the hash values are evenly distributed over the range of
size_t
integers.cppreference.com
notes thatAvoiding hash collisions coupled with that weakness means that a specialization of
std::hash
for your types should never simply use a (fast) bitwise XOR (^
) to combine the sub hashses of your data-members. Consider this example:The hashes of
p.x
will be in the range [0,255], and so will the hashes ofp.y
. Therefore the hashes of aPoint
will also be in the range [0,255], with 256 (=2^8) possible values. There are 256*256 (=2^16) uniquePoint
objects (astd::size_t
will typically support 2^32 or 2^64 values). So the probability of a hash collision for a good hashing function should be approximately 2^(-16). Our function gives the probability of a hash collision of just under 2^(-8). This is terrible: our hash provides only 8 bits of information, but a good hash should provide 16 bits of information.If the hashing functions of your data-members provide only hash values in the low parts of the
std::size_t
range, you must "shift" the bits of the component hash before combining them, so they each contribute independent bits of information. Doing a left-wise shift looks simplebut that will discard information (due to overflow) if the implementation of
hash< uint8_t >
(in this case) attempts to spread the hash-code values over thestd::size_t
range.Accumulating component hash-code values using a multiply-by-prime-and-add method, as typically done in Java, probably works better in general:
First, the Dr. Dobbs article which says that XOR of two satisfactory hashes will produce a satisfactory hash is simply wrong. This is a good recipe for poor hashes. In general, to create a good hash, you start by decomposing your object into subobjects, each of which there exists a good hash, and combining the hashs. One simple way of doing this is something like:
(This has been designed so that you can use
std::accumulate
if you have a sequence of values.)Of course, this supposed that all of the subtypes have good implementations of
std::hash
. For the basics types and strings, this is a given; for your own types, just apply the above rule recursively, specializingstd::hash
to use theHashAccumulator
on its subtypes. For a standard container of a basic type, it's a bit trickier, because you're not (formally, at least) allowed to specialize a standard template on a type from the standard library; you'll probably have to create a class which uses ofHashAccumulator
directly, and explicitly specify that if you need a hash of such a container.