I see this technique recommended in many places (including stack), and i can't get out of my head that this would reduce entropy! After all, you are hashing something again, that has already been hashed and has a collision chance. Wouldn't collision chance over collision chance results in more collision chances? After researching, it seems I'm wrong, but why?
相关问题
- facebook error invalid key hash for some devices
- Change first key of multi-dimensional Hash in perl
- Multiple for loop iterators to unpack in Python [d
- Bool.hashValue valid to convert to Int?
- Is the c++ hash function reasonably safe for passw
相关文章
- Bcrypt vs Hash in laravel
- What is the fastest way to map group names of nump
- Finding out whether there exist two identical subs
- Oracle STANDARD_HASH not available in PLSQL?
- Looking for a fast hash-function
- Is there any well-known paradigm for iterating enu
- Python: Is there any reason *not* to cache an obje
- C# how to calculate hashcode from an object refere
It does reduce entropy.
In a paper named Random Mapping Statistics by Flajolet and Odlyzko, a theorem (Theorem 2) shows that:
"If an n-bit random function is iterated k times, the expected number of image points is (1 - t_k) * 2^n (for large n), where t_k satisfies the recurrence relation t_0 = 0 and t_{k+1} = e^{-1 + t_k}. From this, it can be shown that the expected number of image points is 2^{n-i+1} when a random function is iterated k = 2^i times."
Further references are as follows:
Gligoroski, D. and Klima, V., 2010, September. Practical consequences of the aberration of narrow-pipe hash designs from ideal random functions. In International Conference on ICT Innovations (pp. 81-93). Springer Berlin Heidelberg.
Bhaumik, R., Dutta, A., Guo, J., Jean, J., Mouha, N. and Nikolić, I., 2015. More Rounds, Less Security?
Dinur, I. and Leurent, G., 2014, August. Improved generic attacks against hash-based MACs and HAIFA. In International Cryptology Conference (pp. 149-168). Springer Berlin Heidelberg.
From the last reference paper, one will find the following two lemmas: Two lemmas on entropy loss. Thus, the observation on entropy loss also hold if k independent random functions are used, instead of one random function that is iterated k times.
Since you tagged md5, I'll use that as an example. From wikipedia:
And then the example plaintexts they give are 256 bytes long. Since the collision attack relies on a 128 byte block of data, and the hash digest is only 128 bits, there really isn't an increased risk of a collision attack succeeding beyond the first iteration - that is to say that you can't really influence the likelihood of a collision beyond the first hash.
Also consider that the entropy of the hash is the aforementioned 128 bits. Even considering that the total collision chance is only 2^20.96 (again from wikipedia), it would take a great number of iterations to cause two inputs to collide. The first-glance reasoning that I think you're falling victim to is:
This can be disproven by counterexample fairly easily. Consider again MD5:
MD5 any two inputs 128 times in a row and you will see that this is not true. You probably won't find a single repeated hash between them - after all, you've only created 256 out of a possible 2^128 hash values, leaving 2^120 possibilities. The probabilities of collisions between each round is independent of all other rounds.
There are two approaches to understand why this is so. The first is that each iteration is essentially trying to hit a moving target. I think you could construct a proof based on the birthday paradox that there is a surprisingly low number of iterations of hashing where you will likely see one hash digest from one input match the hash digest of a different input. But they would almost certainly have occurred at different steps of the iteration. And once that occurs, they can never have the same output on the same iteration because the hash algorithm itself is deterministic.
The other approach is to realize that the hash function actually adds entropy while it runs. Consider that an empty string has a 128 bit digest just like any other input; that cannot occur without entropy being added during the algorithm steps. This is actually a necessarily part of a cryptographic hash function: data must be destroyed or else the input could be recovered from the digest. For inputs longer than the digest, yes, entropy is lost on the whole; it has to be in order to fit into the digest length. But some entropy is also added.
I don't have as exact numbers for other hash algorithms, but I think all the points I've made generalize to other hash functions and one-way / mapping functions.