I've noticed things such as
MD5 has been cracked for collisions and is no longer cryptographically secure; use SHA-1 instead. SHA-1 has been cracked for collisions and is no longer cryptographically secure; use SHA-2 instead.
From my current understanding, the chance of getting a certain hash h(d)
from data d
is equal for all hashing results. This implies, then, that the only strengthening mechanism for a hashing algorithm is to return longer hashes.
This also implies that all hashes (when not taking hash result length into account) are equally insecure to brute forcing, and that cryptographically broken only refers to quicker attacks other than brute force searching.
Is this true? What measures do modern cryptographic hashing algorithms use to prevent collision attacks?
The PHD thesis of cryptographer Bart Preneel is a good overview of this subject: Analysis and Design of Cryptographic Hash Functions.
You might be especially interested in the concept of provably secure cryptographic hash functions, which exist but are currently not considered practical.
MD5 is not collision resistant. The definition from Wikipedia states:
MD5 generates a 128 bit hash that can now be broken within seconds now.
One way to prevent it is to increase the size of the hashing results (output size in bits). That allows time for cryptographers to investigate vulnerabilities. It's been ages since I done this...but yeah, in a nutshell :-)
I only have a partial answer. =)
This is not true. The point of hashing is to get the same hash h(d) for some data d every time. The algorithm therefore may not have any "chances" of producing a hash. The dilemma a hash algorithm is facing is, that calculation of h(d) from d should be easy, but the calculation of a valid d from h(d) should be hard.
An algorithm is considered "not cryptographically secure", if calculating d from h(d) can be done faster than brute-force.
Its easy to see that not all hash algorithms have to be equally insecure by taking the hash function that return the first several bits of d. Regardless of the number of bits you take, its trivial to find collisions.
Yes, "cryptographically broken" means precisely that. I'm afraid I'm not an expert, so I can't answer the second question.
The statement "X hash function has been broken" means that there's a defect in the hash function algorithm such that a collision can be generated faster than via bruteforcing. Look at this post by Bruce Schneier - he says that a SHA-1 collision can now be generated much faster, that's all.
So yes, they are all equally insecure to bruteforcing, but that's not what "X hash function has been broken" statement is about.
To be precise, there are three "attacks" that can be performed on a hash function h:
Preimages: given an "output" y (a sequence of bits of the right size, e.g. 128 bits for MD5), find a message m (an arbitrary sequence of bits) such that h(m) = y.
Second preimage: given a message m, find another message m', distinct from m, such that h(m) = h(m').
Collision: find two messages m and m', distinct from each other, such that h(m) = h(m').
A good hash function is a function which resists those three attacks as best as can possibly done. What is that best ? It depends on the hash output size n.
Consider a theoretically perfect hash function, consisting in a black box; in the box are a gnome, some dice, and a big book. When asked for a hash value, for a given input message, the gnome uses the dice to generate a random output, which he gives back. But he also writes the message and the output in his big book. If he is asked for the hash of the same message again, then he will return the same value as previously. This setup is what theoreticians call a random oracle. The random oracle never contradicts itself, but you cannot know anything on what it will answer on a given question (and input message) until that precise question has been asked to him.
If the random oracle outputs strings of length n bits, then one can find preimages and second preimages in time O(2n), and collisions in time O(2n/2). The time is expressed here in terms of requests to the random oracle. The resistance on preimages is easy to understand: the attacker simply tries messages until, by pure luck, the oracle outputs the expected value; he cannot do better since even the gnome does not know anything about what he will get for any new input. Due to the uniformly random nature of the oracle, the expected number of trials is 2n. Resistance to second preimages flows along the same lines. For collisions, this is related to what is known as the birthday paradox.
A hash function is then deemed secure if its resistances appear to match those of the random oracle. It cannot do better; we require that it does not fare worse either.
A hash function is then said to be "broken" when someone came up with a method for finding preimages, second preimages or collisions with a success rate higher than the generic methods which work on a random oracle. SHA-1 is said to be broken because a method for finding collisions, with cost about 261, was found (SHA-1 has a 160-bit output, and thus should resist collisions up to work factor about 280). Note that the full method was not run, i.e. we do not have an actual collision. 261 is still very high, on the fringes of the doable (it would require several thousands of machines, running of several months or years). But 261 if half a million times smaller than 280, so that is a break.
Sometimes, "breaks" are found for which the attack is ludicrously expensive (e.g. an attack in cost 2112 for a theoretical resistance of 2168). We then talk about an "academic" break: faster than the expected resistance, but still fully undoable with earth-based technology.
Note that not all usages of hash functions rely on resistance to collisions. Actually, most usages rely on resistance to preimages or second preimages. For those attacks, SHA-1 is currently as good as new, with its 2160 resistance.
Conversely, even a perfect hash function with a 128-bit output should be considered as weak, at least for collision resistance: the generic attack, with a work factor of 264, is within the realm of the technologically feasible. This is why newer hash standards use larger outputs (e.g. SHA-2 begins at 224 bits of output). A large enough output defeats generic attacks.