Why is std::hash not guaranteed to be deterministi

2020-05-24 20:56发布

问题:

Hereafter, we use N4140 (C++14 Standard).


According to § 17.6.3.4 Hash requirements,

The value returned shall depend only on the argument k for the duration of the program.

[ Note: Thus all evaluations of the expression h(k) with the same value for k yield the same result for a given execution of the program. — end note ]

and § 20.9.12 Class template hash says

...

the instantiation hash<Key> shall:

(1.1) — satisfy the Hash requirements (17.6.3.4) ...

(1.2) — ...


This means a hash value of value (i.e. hash<decltype(value)>(value)) may take a different value if you restart the program.

But why? This limitation was not in the Standard of C++11, but in the Standard of C++14, C++17 and C++20. As a user (not a STL developer), it would be quite useful if std::hash were deterministic. Are there any mathematical difficulties in implementing a deterministic hash function? But hash functions we daily use (e.g. deprecated md5sum or safer sha256) are all deterministic. Is there a problem of efficiency?

回答1:

There is no need for the hash function to be deterministic between runs, but you can still provide your own hash, e.g. for unordered containers if it's a behavior you rely on.

As for why, cppreference says:

Hash functions are only required to produce the same result for the same input within a single execution of a program; this allows salted hashes that prevent collision denial-of-service attacks.

If the Hash requirements tells it to be deterministic, then you wouldn't be able to provide a salted hash without breaking the requirement.

Here is the actual explanation why



回答2:

This answer (and links in it) suggested by @NathanOliver is ultimately helpful. Let me cite important parts.

For a non-cryptographic hash function, it's possible to pre-calculate massive inputs with the same hashed value to algorithmically slow down the unordered containers, and results in a denial-of-service attack.

(from Issue 2291. std::hash is vulnerable to collision DoS attack)

For this reason, language designers are migrating to random hashing. In random hashing, the hash value of the string “a” can change every time you run your program. Random hashing is now the default in Python (as of version 3.3), Ruby (as of version 1.9) and Perl (as of version 5.18).

(from Do you realize that you are using random hashing?)

Move to Ready, rather than Immediate, as even the permission has been contentious in reflector discussion

(from Issue 2291. std::hash is vulnerable to collision DoS attack)

In practice, as far as I understand, no implementation of std::hash implements random hashing but you can write your own my::secure_hash.

(from this answer)


P.S.

I just googled "hash table dos" and found an informative page: The moment when you realize every server in the world is vulnerable.