Can two different strings generate the same MD5 ha

2019-01-07 05:23发布

For each of our binary assets we generate a MD5 hash. This is used to check whether a certain binary asset is already in our application. But is it possible that two different binary assets generate the same MD5 hash. So is it possible that two different strings generate the same MD5 hash?

11条回答
贼婆χ
2楼-- · 2019-01-07 05:30

Yes, of course: MD5 hashes have a finite length, but there are an infinite number of possible character strings that can be MD5-hashed.

查看更多
走好不送
3楼-- · 2019-01-07 05:32

I realize this is old, but thought I would contribute my solution. There are a 2^128 possible hash combinations. And thus a 2^64 probability of a birthday paradox. While the solution below won't eliminate possibility of collisions, it surely will reduce the risk by a very substantial amount.

2^64 = 18,446,744,073,709,500,000 possible combinations

What I have done is I put a few hashes together based on the input string to get a much longer resulting string that you consider your hash...

So my pseudo-code for this is:

Result = Hash(string) & Hash(Reverse(string)) & Hash(Length(string))

That is to practical improbability of a collision. But if you want to be super paranoid and can't have it happen, and storage space is not an issue (nor is computing cycles)...

Result = Hash(string) & Hash(Reverse(string)) & Hash(Length(string)) 
         & Hash(Reverse(SpellOutLengthWithWords(Length(string)))) 
         & Hash(Rotate13(string)) Hash(Hash(string)) & Hash(Reverse(Hash(string)))

Okay, not the cleanest solution, but this now gets you a lot more play with how infrequently you will run into a collision. To the point I might assume impossibility in all realistic senses of the term.

For my sake, I think the possibility of a collision is infrequent enough that I will consider this not "surefire" but so unlikely to happen that it suits the need.

Now the possible combinations goes up significantly. While you could spend a long time on how many combinations this could get you, I will say in theory it lands you SIGNIFICANTLY more than the quoted number above of

2^64 (or 18,446,744,073,709,551,616) 

Likely by a hundred more digits or so. The theoretical max this could give you would be

Possible number of resulting strings:

528294531135665246352339784916516606518847326036121522127960709026673902556724859474417255887657187894674394993257128678882347559502685537250538978462939576908386683999005084168731517676426441053024232908211188404148028292751561738838396898767036476489538580897737998336

查看更多
女痞
4楼-- · 2019-01-07 05:39

Just to be more informative. From a math point of view, Hash functions are not injective.
It means that there is not a 1 to 1 (but one way) relationship between the starting set and the resulting one.

Bijection on wikipedia

EDIT: to be complete injective hash functions exist: it's called Perfect hashing.

查看更多
走好不送
5楼-- · 2019-01-07 05:45

MD5 is a hash function – so yes, two different strings can absolutely generate colliding MD5 codes.

In particular, note that MD5 codes have a fixed length so the possible number of MD5 codes is limited. The number of strings (of any length), however, is definitely unlimited so it logically follows that there must be collisions.

查看更多
ら.Afraid
6楼-- · 2019-01-07 05:45

Yes, it is possible. It is called a Hash collision.

Having said that, algorithms such as MD5 are designed to minimize the probability of a collision.

The Wikipedia entry on MD5 explains some vulnerabilities in MD5, which you should be aware of.

查看更多
聊天终结者
7楼-- · 2019-01-07 05:47

For a set of even billions of assets, the chances of random collisions are negligibly small -- nothing that you should worry about. Considering the birthday paradox, given a set of 2^64 (or 18,446,744,073,709,551,616) assets, the probability of a single MD5 collision within this set is 50%. At this scale, you'd probably beat Google in terms of storage capacity.

However, because the MD5 hash function has been broken (it's vulnerable to a collision attack), any determined attacker can produce 2 colliding assets in a matter of seconds worth of CPU power. So if you want to use MD5, make sure that such an attacker would not compromise the security of your application!

Also, consider the ramifications if an attacker could forge a collision to an existing asset in your database. While there are no such known attacks (preimage attacks) against MD5 (as of 2011), it could become possible by extending the current research on collision attacks.

If these turn out to be a problem, I suggest looking at the SHA-2 series of hash functions (SHA-256, SHA-384 and SHA-512). The downside is that it's slightly slower and has longer hash output.

查看更多
登录 后发表回答