Possible Duplicate:
md5 decoding. How they do it?
this page suggests that a hash algorithm like md5() and sha1() can be reversed because of the huge processing power that we have nowadays. At this point i tought it was only possible with Rainbow Tables. Was i wrong?
In case Rainbow Tables is the only way to go, how someone could reverse a hash that was made with a salt?
Well, this question in general is a duplicate of This Question. However, to answer your exact questions:
At this point i tought it was only possible with Rainbow Tables. Was i wrong?
Technically, yes you are wrong. No hash function is unrecoverable, given enough processing power. The key point, is how much processing power it takes, which in most cases is far bigger than you can imagine. The reason is that the number of possible values increases exponentially at each stage of the hash cycle. For MD5, each stage (there are 64 of them) would multiply the number of possibilities by 10^77 (a lot of zeros). So to successfully reverse an MD5, you'd have to try a really large number of possible permutations (a back-of-envelope calculation shows somewhere on the order of 10^4932 tries). With the fastest super computer ever created today (about 8 petaflops, or 8x10^15 floating point operations per second), you're looking at approximately 10^4908 years to reverse it. Which incidentally is 2.5x10^4898 times the age of the universe right now. Really, it's an enormous number that's beyond our human ability to comprehend...
And that's an absolute best possible case situation.
So technically it is possible to reverse. But practically, no it is not.
In case Rainbow Tables is the only way to go, how someone could reverse a hash that was made with a salt?
The thing is that nobody needs to reverse it. They just need to find a collision. Basically, a collision is two inputs that lead to the same output. So if hash(a) = x
and hash(b) = x
, a
and b
are collisions of each other. So, all we need to do is find a collision (which believe it or not is easier than finding the exact input, since there are technically an infinite number of inputs that can give a particular output). With input the size of passwords, typically the collision is the original password.
The easiest way to find this collision is with a precomputed list of hashes (typically called a rainbow table). Basically all you need to do is then look up the hash in the table to see if the original is there. If so, you're done (that easy).
Salts are typically added to combat rainbow tables. This is because if the user enters 1234
as their password, and you use abcdefghijklmnop
as the salt, the original would be 1234abcdefgjhijklmnop
, which is significantly less likely to appear in a rainbow table. So adding a strong salt will protect against precomputed rainbow tables.
Brute Forcing
However, there is a significant concern if you just do hash(pass + salt)
. It's not susceptible to precomputed rainbow tables, but it is susceptible to brute forcing. The reason is that cryptographic hash functions (such as sha1, md5, sha256, etc) are designed to be fast. Their traditional role is in Signing, so they need to be fast to be useful. However, in password storage this is the weakness. With modern GPU's, an attacker could brute force (just try every possible password permutation) a simple hash with salt in a matter of hours (for more details, see my blog post about it)...
The best prevention
The best prevention has two features:
It's not easy to pre-compute a table of values (a rainbow table)
It's not fast to hash a single value (not easy to brute force).
As it turns out, there's a simple way of doing that using a hash function. Simply iterate over it and make the output dependent upon a large number of hash functions:
var result = password + salt;
for (var i = 0; i < 10000000; i++) {
result = hash(result + salt);
}
The key is that by making it artificially slow and using a salt, you're making it resistant to precomputation and brute forcing.
As it turns out, there are 2 standard algorithms that do this (well, use the principles).
The best one is Blowfish hash (bcrypt) which doesn't really use a hash primitive function, but uses the key derivation cycle of the Blowfish cipher. It's available in PHP via crypt()
. To use it:
$hash = crypt($password, '$2a$07$' . $salt . '$');
And verify it with
$hash == crypt($password, $hash);
The other method (which is slightly less preferred) is PBKDF2. To program it in PHP:
function pbkdf2($hashFunc, $password, $salt, $iterations, $length = 32) {
$size = strlen(hash($hashFunc, '', true));
$len = ceil($length / $size);
$result = '';
for ($i = 1; $i <= $len; $i++) {
$tmp = hash_hmac($hashFunc, $salt . pack('N', $i), $password, true);
$res = $tmp;
for ($j = 1; $j < $iterations; $j++) {
$tmp = hash_hmac($hashFunc, $tmp, $password, true);
$res ^= $tmp;
}
$result .= $res;
}
return substr($result, 0, $length);
}
Note:
None of these will protect a user against a very weak password. If they enter a dictionary word, or a common password it will still be likely that an attacker will be able to crack it. They will however add to the defense against moderate strength passwords...
Some more reading:
- Many hash iterations, append salt every time?
- Fundimental difference between hashing and encryption
- MD5 decoding, how they do it
- The Rainbow Table Is Dead
- You're storing your password incorrectly
- Password storage, salt vs multiple hashes
A rainbow table is "just" a big table of precomputed hash values with some trickery to store only a small portion of the table and still be able to lookup all values. In fine details, a rainbow table which can "invert" N possible values (i.e. there are N hash outputs for which the table will yield a corresponding input) takes time about 1.7*N to build -- so building the table is actually slower than "just" trying out the N inputs and see if one matches the given hash output. The table advantage is when you have several hash outputs for which you want to find a matching input.
Maybe you could use the following attack, adopting, the technique used to make the hash is a simple calculation.
For example, if the calculation would be made with a modular hash in 100, we have:
Example input:
8379547378
Output hash:
78
A general formula for the hash value 78 would be 78 +100*k (k belonging integers). Thus, one could try all the possible sequences. Note that this reduces the search space from 100% to 1% in this case the module 100. If it were possible to establish a hunch that this number was 10 digits, we could make the search even further reduced to 78 +100 k (10^7<=k< 10^8).
Another way would be to populate a database with a number of really great of hashes and their entrances, then search in this database.
I hope I have helped a little.
Technically any standard hashing algorithms is not reversible! so once you get the hash of a message, there should be no way to get the original message out of its hash string. The only way people try to crack it is to use brute force attack. Brute forcing is the stupidest thing you can do, trying all possible keys! That explains why one of the characteristic of a secure cryptographic algorithm is to have a large key space. But if you utilize the process in some cases it can be practical, that's what exactly rainbow tables do.
A rainbow table is a precomputed table for all possible combinations up to a certain length. That means you create all possible combination of characters(upper case and lower case), numbers and special characters up to certain length. As far as I know the most complete rainbow table can break hash of strings up to 10 characters which include numbers, uppercase and special characters, so if you have string longer than that there shouldn't be any security concern about breaking the hash itself. as you can see here the size of the table that can break vista passwords up to 8 characters is over 100GB and that number increases exponentially which makes it kind of impossible to go further that 10 or 12 characters.
As long as your string is not easy to guess, long enough and it contains upper case letters, numbers and special characters, there no need to worry :)
First, in general, it is not possible to "reverse" a cryptographic hash function. This is because these functions generally take far more input data than they output.
For instance, MD5 takes 512 input bits (64 bytes) and produces 128 bits output (16 bytes). Thus, there is simply not enough information in the input to reconstruct the output. In fact, there will be approximately 2^384 (a really big number) of different inputs that have exactly the same output hash.
Instead cryptographers talk about three different kinds of attacks on hashes:
- first preimage attack: given a hash h, find any message m such that hash(m) = h
- second preimage attack: given a fixed message m1, find any other message m2 such that hash(m1) = hash(m2)
- collision attack: find any two messages m1 and m2 such that hash(m1) = hash(m2)
Now back to this "reversing" business. When you want to "break a MD5 password", what you really want to do is a first preimage attack: find any "password" m such that hash(m) matches stored hash h. Normally this would take on the order of 2^128 guesses to do by brute force (more than all the computers on earth could manage in a century). There exist known weaknesses in MD5 that bring this down to ~2^123, which is still too hard to be practical.
But because passwords are generally short strings of letters and numbers, there are far fewer than 2^128 passwords that people are actually likely to use. There's more like 2^40 (ie around a trillion). That's still a lot, but not so many that it's impossible to try them all if you have a year or so or a lot of PS3s. But what if you knew you wanted to break a whole lot of passwords? Rather than make 2^40 guesses each time, you can store the hashes of all the likely passwords on disk for future use. This is (more or less) what a rainbow table is: someone has already done all the work so that you just lookup the answer and skip most of the work.
You are right that using a salt breaks that. Now you're back to 2^40 guesses and your roomful of PS3s again. Using a better hash function (like SHA512 or SKEIN) doesn't really change this because it doesn't change the number of likely passwords you need to try. Lesson: you need to use a really hard to guess password!
Ok, but aren't MD5 and SHA1 still considered broken? Yes, but not in ways that really matter here (yet). The exciting news in this area has all been about collision attacks, which are relevant to breaking parts of SSL security and digital signatures, but aren't relevant to breaking stored passwords. Cryptographers expect this work to lead to even better attacks before long, so it's probably not a good idea to use MD5 or SHA1 in new programs, but programs that are using MD5/SHA1 + proper salting for password storage are still fine.
The brute-forcing that page is talking about is basically the generation of a rainbow table on the fly.