For demonstration-purposes, what are a couple examples of strings that collide when hashed? MD5 is a relatively standard hashing-option, so this will be sufficient.
问题:
回答1:
This page provides these examples of 128 byte values hashing to the same value:
d131dd02c5e6eec4693d9a0698aff95c 2fcab58712467eab4004583eb8fb7f89
55ad340609f4b30283e488832571415a 085125e8f7cdc99fd91dbdf280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e2b487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080a80d1e c69821bcb6a8839396f9652b6ff72a70
and
d131dd02c5e6eec4693d9a0698aff95c 2fcab50712467eab4004583eb8fb7f89
55ad340609f4b30283e4888325f1415a 085125e8f7cdc99fd91dbd7280373c5b
d8823e3156348f5bae6dacd436c919c6 dd53e23487da03fd02396306d248cda0
e99f33420f577ee8ce54b67080280d1e c69821bcb6a8839396f965ab6ff72a70
Note that although your question asked for "strings" which collide, MD5 is defined over binary data, so the normal text meaning of "string" doesn't really apply. Languages and libraries which allow you to take the MD5 hash of text data usually mean "encode the string in a specified encoding, then hash the result."
回答2:
The second-most interesting collision I know of is this:
(30820432)3082039ba003020102020309cfc7300d06092a864886f70d0101040500305a310b300906
0355040613025553311c301a060355040a1313457175696661782053656375726520496e632e312d
302b06035504031324457175696661782053656375726520476c6f62616c2065427573696e657373
2043412d31301e170d3038313130333037353230325a170d3039313130343037353230325a308201
1c310b300906035504061302555331493047060355040a1340692e62726f6b652e7468652e696e74
65726e65742e616e642e616c6c2e692e676f742e7761732e746869732e742d73686972742e706872
6565646f6d2e6f726731133011060355040b130a475431313032393030313131302f060355040b13
28536565207777772e726170696473736c2e636f6d2f7265736f75726365732f6370732028632930
38312f302d060355040b1326446f6d61696e20436f6e74726f6c2056616c696461746564202d2052
6170696453534c2852293149304706035504031340692e62726f6b652e7468652e696e7465726e65
742e616e642e616c6c2e692e676f742e7761732e746869732e742d73686972742e7068726565646f
6d2e6f726730820122300d06092a864886f70d01010105000382010f003082010a0282010100b2d3
2581aa28e878b1e50ad53c0f36576ea95f06410e6bb4cb07170000005bfd6b1c7b9ce8a9a3c5450b
36bb01d153aac3088f6ff84f3e87874411dc60e0df9255f9b8731b5493c59fd046c460b63562cdb9
af1ca86b1ac95b3c9637c0ed67efbbfec08b9c502f29bd83229e8e08faac1370a2587f62628a11f7
89f6dfb667597316fb63168ab49138ce2ef5b6be4ca49449e465510a4215c9c130e269d5457da526
bbb961ec6264f039e1e7bc68d850519e1d60d3d1a3a70af80320a170011791364f0270318683ddf7
0fd8071d11b31304a5daf0ae50b1280e63692a0c826f8f4733df6ca20692f14f45bed93036a32b8c
d677ae35637f4e4c9a934836d99f0203010001a381bd3081ba300e0603551d0f0101ff0404030204
f0301d0603551d0e04160414cda683faa56037f796371729de4178f1878955e7303b0603551d1f04
3430323030a02ea02c862a687474703a2f2f63726c2e67656f74727573742e636f6d2f63726c732f
676c6f62616c6361312e63726c301f0603551d23041830168014bea8a07472506b44b7c923d8fba8
ffb3576b686c301d0603551d250416301406082b0601050507030106082b06010505070302300c06
03551d130101ff04023000(300d06092a864886f70d010104050003818100a721028dd10ea2807725
fd4360158fecef9047d484421526111ccdc23c1029a9b6dfab577591dae52bb390451c3063563f8a
d950faed586cc065ac6657de1cc6763bf5000e8e45ce7f4c90ec2bc6cdb3b48f62d0feb7c5267244
edf6985baecbd195f5da08be6846b175c8ec1d8f1e7a94f1aa5378a245ae54ead19e74c87667)
which collides with this (remove the parts in parentheses):
(30820432)3082039ba003020102020141300d06092a864886f70d0101040500305a310b3009060355
040613025553311c301a060355040a1313457175696661782053656375726520496e632e312d302b
06035504031324457175696661782053656375726520476c6f62616c2065427573696e6573732043
412d31301e170d3034303733313030303030305a170d3034303930323030303030305a303c313a30
38060355040313314d443520436f6c6c6973696f6e7320496e632e2028687474703a2f2f7777772e
7068726565646f6d2e6f72672f6d64352930819f300d06092a864886f70d010101050003818d0030
818902818100baa659c92c28d62ab0f8ed9f46a4a437ee0e196859d1b3039951d6169a5e376b15e0
0e4bf58464f8a3db416f35d59b151fdbc438527081975e8fa0b5f77e39f032ac1ead44d2b3fa48c3
ce919becf49c7ce15af5c8376b9a83dee7ca2097314273159168f488aff92828c5e90f73b0174b13
4c9975d044e67e086c1af24f1b410203010001a382022430820220300b0603551d0f0404030201c6
300f0603551d130101ff040530030101ff301d0603551d0e04160414a704601fab724308c57f0890
55561cd6cee638eb301f0603551d23041830168014bea8a07472506b44b7c923d8fba8ffb3576b68
6c308201be06096086480186f842010d048201af168201ab33000000275e39e089610f4ea3c5450b
36bb01d153aac3088f6ff84f3e87874411dc60e0df9255f9b8731b5493c59fd046c460b63562cdb9
af1ca8691ac95b3c9637c0ed67efbbfec08b9c502f29bd83229e8e08faac1370a2587f62628a11f7
89f6dfb667597316fb63168ab49138ce2ef5b6be4ca49449e465110a4215c9c130e269d5457da526
bbb961ec6264f039e1e7bc68d850519e1d60d3d1a3a70af80320a170011791364f0270318683ddf7
0fd8071d11b31304a5dcf0ae50b1280e63692a0c826f8f4733df6ca20692f14f45bed93036a32b8c
d677ae35637f4e4c9a934836d99f0203010001a381bd3081ba300e0603551d0f0101ff0404030204
f0301d0603551d0e04160414cda683faa56037f796371729de4178f1878955e7303b0603551d1f04
3430323030a02ea02c862a687474703a2f2f63726c2e67656f74727573742e636f6d2f63726c732f
676c6f62616c6361312e63726c301f0603551d23041830168014bea8a07472506b44b7c923d8fba8
ffb3576b686c301d0603551d250416301406082b0601050507030106082b06010505070302300c06
03551d130101ff04023000(300d06092a864886f70d010104050003818100a721028dd10ea2807725
fd4360158fecef9047d484421526111ccdc23c1029a9b6dfab577591dae52bb390451c3063563f8a
d950faed586cc065ac6657de1cc6763bf5000e8e45ce7f4c90ec2bc6cdb3b48f62d0feb7c5267244
edf6985baecbd195f5da08be6846b175c8ec1d8f1e7a94f1aa5378a245ae54ead19e74c87667)
Those are two X.509 certificates of which only the first one was actually signed by the Certificate Authority. The first part is just a header, but the last part (which you will note is the same in the two certificates) is an RSA signature of the MD5 hash of the colliding messages. This means that the second (fake) certificate will validate as having been signed by the Certificate Authority's private RSA key.
This attack involved more than 200 Playstation 3 to prepare the attack and some clever timing on the part of the attackers. For more details see: MD5 considered harmful today.
The most interesting collision I know of is the one used in the Flame espionage malware. Using a different, but similar, technique, an advanced persistent threat (most probably a western intelligence agency) created a fake code signing certificate that claimed to have been signed by Microsoft. See for instance this article. Unfortunately, I don't have access to the actual certificates and the actual MD5-collision.
回答3:
If the purpose is to demonstrate why or how hash functions result in collisions, and not MD5 specifically, the clearest hash function I know of is simple modular division. Suppose you are storing values in an array of size 10. To find the index to store a particular value x, the index = x % 10. It is obvious that there will be a lot of collisions in this tiny hash table, since for example, 'A' (65) and 'K' (75) will both hash to 5. MD5 can produce 2128 distinct values, so collisions are much less probable but still possible.
回答4:
Søren Steffen Thomsen released a md5 collision finder utility written in C. Might be fun to play with.