Is the floating point specialisation of std::hash
(say, for double
s or float
s) reliable regarding almost-equality? That is, if two values (such as (1./std::sqrt(5.)/std::sqrt(5.))
and .2
) should compare equal but will not do so with the ==
operator, how will std::hash
behave?
So, can I rely on a double
as an std::unordered_map
key to work as expected?
I have seen "Hashing floating point values" but that asks about boost; I'm asking about the C++11 guarantees.
std::hash
has same guarantees for all types over which it can
be instantiated: if two objects are equal, their hash codes will
be equal. Otherwise, there's a very large probability that they
won't. So you can rely on a double
as a key in an
unordered_map
to work as expected: if two doubles are not
equal (as defined by ==
), they will probably have a different
hash (and even if they don't, they're different keys, because
unordered_map
also checks for equality).
Obviously, if your values are the results of inexact
calculations, they aren't appropriate keys for unordered_map
(nor perhaps for any map).
Multiple problems with this question:
The reason that your two expressions don't compare as equal is NOT that there are two binary expressions of 0.2, but that there is NO exact (finite) binary representation of 0.2
, or sqrt(5)
! So in fact, while (1./std::sqrt(5.)/std::sqrt(5.))
and .2
should be the same algebraically, they may well not be the same in computer-precision arithmetic. (They aren't even in pen-on-paper arithmetic with finite precision. Say you are working with 10 digits after the decimal point. Write out sqrt(5)
with 10 digits and calculate your first expression. It will not be .2
.)
Of course you have a sensible concept of two numbers being close. In fact you have at least two: One absolute (|a-b| < eps
) , one relative. But that doesn't translate into sensible hashes. If you want all numbers within eps
of each other to have the same hash, then 1, 1+eps, 1+2*eps, ...
would all have the same hash and therefore, ALL numbers would have the same hash. That is a valid, but useless hash function. But it is the only one that satisfies your requirement of mapping nearby values to the same hash!
There is no rigorous concept of "almost equality". So behavior can't be guaranteed in principle. If you want to define your own concept of "almost equal" and construct a hash function such that two "almost equal" floats have the same hash, you can. But then it will only be true for your particular notion of "almost equal" floats.
Behind the default hashing of an unordered_map
there is a std::hash
struct which provides the operator()
to compute the hash of a given value.
A set of default specializations of this templates is available, including std::hash<float>
and std::hash<double>
.
On my machine (LLVM+clang) these are defined as
template <>
struct hash<float> : public __scalar_hash<float>
{
size_t operator()(float __v) const _NOEXCEPT
{
// -0.0 and 0.0 should return same hash
if (__v == 0)
return 0;
return __scalar_hash<float>::operator()(__v);
}
};
where __scalar_hash
is defined as:
template <class _Tp>
struct __scalar_hash<_Tp, 0> : public unary_function<_Tp, size_t>
{
size_t operator()(_Tp __v) const _NOEXCEPT
{
union
{
_Tp __t;
size_t __a;
} __u;
__u.__a = 0;
__u.__t = __v;
return __u.__a;
}
};
Where basically the hash is built by setting a value of an union to the source value and then getting just a piece which is large as a size_t
.
So you get some padding or you get your value truncated, but that doesn't really matter because as you can see the raw bits of the number are used to compute the hash, this means that it works exactly as the ==
operator. Two floating numbers, to have the same hash (excluding collision given by truncation), must be the same value.