hash collision and appending data

2019-06-08 01:12发布

问题:

Assume I have two strings (or byte arrays) A and B which both have the same hash (with hash I mean things like MD5 or SHA1). If I concatenate another string behind it, will A+C and B+C have the same hash H' as well? What happens to C+A and C+B?

I tested it with MD5 and in all my tests, appending something to the end made the hash the same, but appending at the beginning did not.

Is this always true (for all inputs)?

Is this true for all (well-known) hash functions? If no, is there a (well-known) hash function, where A+C and B+C will not collide (and C+A and C+B do not either)?

(besides from MD5(x + reverse(x)) and other constructed stuff I mean)

回答1:

This depends entirely on the hash function. Also, the probability that you have those collisions is really small.



回答2:

Details depend on the hash function H, but generally they work as follows:

  1. Consume a block of input X (say, 512 bits)
  2. Break the input into smaller pieces (say, 32 bits) and update hash internal state based on the input
  3. If there's more input, go to step 1
  4. At the end, spit the internal state out as the hash value H(X)

So, if A and B collide i.e. H(A) = H(B), the hash will be in the same state after consuming them. Updating the state further with the same input C can make the resulting hash value identical. This explains why H(A+C) is sometimes H(B+C). But it depends how A's and B's sizes are aligned to input block size and how the hash breaks the input block internally.

C+A and C+B can be identical if C is a multiple of the hash block size but probably not otherwise.



回答3:

The hash functions being discussed here are typically cryptographic (SHA1, MD5). These hash functions have an Avalanche effect -- the output will change drastically with a slight change in the input.

The prefix and suffix extension of C will effectively make a longer input. So, adding anything to the front or rear of the input should change the effective hash outputs significantly.

I do not understand how you did the MD5 check, here is my test.

echo "abcd" | md5sum
70fbc1fdada604e61e8d72205089b5eb

echo "0abcd" | md5sum
f5ac8127b3b6b85cdc13f237c6005d80

echo "abcd0" | md5sum
4c8a24d096de5d26c77677860a3c50e3

Are you saying that you located two inputs which had the same MD5 hash and then appended something to the end or beginning of the input and found that adding at the end resulted in the same MD5 as that for the original input?

Please provide samples with your test results.