This question is similar to this, but that one only references MD5 collision demos.
Are there any actual SHA1 collision pairs of arbitrary messages known so far ?
I'd like to use these to test how various software products (my own one and some third party) deal with it.
Doing some Google searches only turned up the oh-so prominent MD5 / SHA0 collisions and some hints on an approach to creating SHA1 collisions but I could not get my hands on any examples.
The first known collision has now been published at https://shattered.it/
As of February 23rd 2017 this answer is no longer accurate.
For more than six years, the SHA1 cryptographic hash function underpinning Internet security has been at death's door. Now it's officially dead, thanks to the submission of the first known instance of a fatal exploit known as a "collision."
There is no known collision for SHA-1 yet. Right now:
- There are some collisions on reduced versions of SHA-1, with less than the 80 rounds of the standard SHA-1.
- An algorithm has been describe, which should obtain a SHA-1 collision with a computational effort roughly equivalent to 263 invocations of SHA-1 on small messages; that's much better than generic algorithms (which require 280 invocations on average) but that's still quite big and that algorithm has not been run yet.
There was an effort to obtain a SHA-1 collision by harnessing power from whoever had some spare CPU clock cycles to donate, with the BOINC framework to organize the whole thing, but there were not enough volunteers and the effort was abandoned last year. Hence no actual SHA-1 collision yet.
Theoretical attacks rely on some assumptions which may prove to be slightly false; for instance, the attack on MD5 is actually a bit faster than expected (at some point there is a property which must be fulfilled, with a theoretical probability of 2-28, but in practice it is more like 2-27.7, i.e. the attack is 20% faster than predicted). It is still considered that the theoretical attack is correct and the complexity "rather accurate".
Google's Security Blog describes the first public, intentional SHA-1 collision here:
https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
Direct links to 2 PDFs with the same SHA-1 (from the site dedicated to this finding):
- https://shattered.it/static/shattered-1.pdf
- https://shattered.it/static/shattered-2.pdf
Again, Marc Stevens was involved along with CWI Amsterdam and some Google employees, but this time for the full-round SHA-1 on two constructed PDFs.
Stevens also notes that due to SHA-1's Merkle-Damgård construction, both 2 PDFs can be extended (appended) with the same arbitrary data to produce longer versions hashing to the same digest.
Google will apparently publish the accompanying source code in 90 days from now (February 23, 2017), giving affected system suppliers some time to update their stuff.
It remains to be seen how software like git and service providers like GitHub will deal with this, especially in terms of backwards compatibility.
Linus Torvalds has issued a statement regarding git, noting that they will migrate to newer hashes in a compatible way, but that it will take time.
By the way, the "shattered" collision demo does not affect git (without modifications), because it uses SHA-1 like this:
sha1("blob " + <size in octets as text> + "\0" + <contents>)
You can get the git hash using git hash-object <file path>
, even if the file is not in git.
In related news, Subversion seems to be the first real victim of this proof, causing repository corruption, thereby making the mentioned files practical exploits.
-- PREVIOUSLY... --
A 76-round collision was found by Marc Stevens.
Cryptographer Jean-Philippe Aumasson, co-creator of BLAKE and SipHash and initiator of the Password Hashing Competition (PHC), guesses an SHA-1 collision on the full 80 rounds will have been found by 2020.
According to ongoing research by Marc Stevens et al. published in October 2015,
... we estimate the SHA-1 collision cost today (i.e., Fall 2015) between
75K$ and 120K$ renting Amazon EC2 cloud computing over a few months.
By contrast, security expert Bruce Schneier previously projected the
SHA-1 collision cost to be ~173K$ by 2018.
They also describe a collision attack for SHA-1's compression function.
There is an example in Collision Search Attacks on SHA1 paper by Wang, Yin and Yu, from 2005, but just for weakened, 58-round version of SHA-1. (The full, official SHA-1 performs 80 rounds.)
3 A collision example for 58-step SHA1
h₁ = compress(h₀,M₀) = compress(h₀,M'₀)
_____________________________________________________
h₀: 67452301 efcdab89 98badcfe 10325476 c3d2e1f0
_____________________________________________________
M₀: 132b5ab6 a115775f 5bfddd6b 4dc470eb
0637938a 6cceb733 0c86a386 68080139
534047a4 a42fc29a 06085121 a3131f73
ad5da5cf 13375402 40bdc7c2 d5a839e2
_____________________________________________________
M'₀: 332b5ab6 c115776d 3bfddd28 6dc470ab
e63793c8 0cceb731 8c86a387 68080119
534047a7 e42fc2c8 46085161 43131f21
0d5da5cf 93375442 60bdc7c3 f5a83982
_____________________________________________________
h₁: 9768e739 b662af82 a0137d3e 918747cf c8ceb7d4
_____________________________________________________
Table 2: A collision of SHA1 reduced to 58 steps. The two
messages that collide are M₀ and M'₀. Note that padding
rules were not applied to the messages.
Not exactly SHA1 collision,
but there are collisions of PBKDF2-HMAC-SHA1 message digest authentication code.
For instance, PBKDF2(SHA1, password, salt, iterations, dkLen) of the two passwords plnlrtfpijpuhqylxbgqiiyipieyxvfsavzgxbbcfusqkozwpngsyejqlmjsytrmd
and eBkXQTfuBqp\'cTcar&g*
, salt hunter2
, 4
iterations, provide the same value (35d1c8f259129dc800ec8e073bb68f995424619c
for dkLen 20
).
In fact, it is trivial to find such collisions for strings longer than 64 bytes.
Another collision example (Python3):
>>> import hashlib, binascii
>>> def pbkdf2sha1hex(x, salt, iters):
... h = hashlib.pbkdf2_hmac('sha1', x, salt, iters)
... return binascii.hexlify(h)
>>> pbkdf2sha1hex(b'http://stackoverflow.com/questions/3475648/sha1-collision-demo-example/31136714', b'NaCl', 1000000)
b'20177527e04e05d5e7b448c1ab2b872f86831d0b'
>>> pbkdf2sha1hex(b'\x8c\xbf8\x94\xbc\xf4\xbe\x90xT,r\xbc\x03\xd1\xed\xd9\xea\xfb\x9f', b'NaCl', 1000000)
b'20177527e04e05d5e7b448c1ab2b872f86831d0b'
Please note that the same "problem" applies to PBKDF2-HMAC-SHA256 as well:
>>> h1 = pbkdf2_hmac('sha256', b'http://stackoverflow.com/questions/3475648/sha1-collision-demo-example/31136714', b'NaCl', 1000000)
b"\xcf\xc5\xee\x15=\r\x0b\x0e\x89r\x9b\xe1\xb7'+\xa4'o\x98kn++u\x12\xec\xd9\xec\xea\xebL\xb7"
>>> h2 = pbkdf2_hmac('sha256', b'.\x83\xb0D\x93D\x9f\x162\xf3\xd4x\xb6\x1a\x9f-\x1f\xdb\xdc\xa4\x8f\xb3\x95Y5\xea\x99*\x97\x00V\x81', b'NaCl', 1000000)
>>> h1 == h2
True
It all happens, because from the PBKDF2 definition, for long strings, it holds:
PBKDF2(hashalgo, s, ...) == PBKDF2(hashalgo, hashalgo(s), ...)
.
More info e.g. here: https://mathiasbynens.be/notes/pbkdf2-hmac