Why not use MD5 for password hashing?

2020-01-26 09:59发布

问题:

I have a friend which is a white hat hacker. He says that md5 is not really that bad and actually is really secure, just if we use it properly.

I believe that he is right. As I know, there is 3 ways to break hashes:

  1. Using Rainbow tables (Which can be secured against by a long/random salt)
  2. Collision (Which can be prevented by multiple salts or hashes - as in example bellow)
  3. Generation time (Which is not much important if we use a long enough salt value per each user - AFAIK)

I and my friend believe that Blowfish is not really needy, also it can be harmful because it can slow down password validation process and it can be used with DDOS attacks to break down server even with fewer attak-resources.

So, I want to make sure that is following algorithm really secure or not? And, is there a real reason to go with Blowfish hash algorithm or not?

// return a 256 bit salt + 128 bit md5 binary hash value
function hash(password, salt=null)
{
    salt = (salt != null) ? salt : Random256BitBinaryValueGenerator();
    // What about using another user-specified parameter, like email address as salt?

    return salt + md5(salt + password) + md5(password + salt);

    // Or just use a non-cryptographic hash algorithm like crc32 to prevent collisions:
    // return salt + md5(salt + password) + crc32(salt + password);

    // Or even use two different salts:
    // return salt + md5(salt + password) + md5('C' + salt + password);
}

// check password
function check(password, hash_value)
{
    return hash(password, substring(hash_value, 0, 32)) == hash_value;
}

回答1:

The collision resistance property of MD5 has been broken for a long time. Note that that preimage resistance and second preimage resistance have not yet been cracked, however as there are better algorithms out there (SHA-2) it would be wise to move to these rather than relying on a cryptographic hash that has already begun to lose its cryptographic properties. Note: The collision resistance property does not matter when storing hashed passwords - what you need to make sure of is that the preimage resistance property is sound - that it is computationally infeasible to find the original password given a certain hash value (and salt). As I mentioned, since one of the cryptographic properties is already broken I would be concerned that the others will follow soon.

When you store a password hash, you should build in some protection that the original password cannot be retrieved in case an attacker manages to extract these hashes. This is important because if an attacker manages to retrieve the password table only they can then use the data to either log into your system directly, or to log into other systems where the user has reused the same password.

When storing passwords, it is important to use a slow algorithm, such as bcrypt, scrypt or pbkdf2. A legitimate user should only have to experience the delay once, upon first login. An attacker will have to experience the delay for every password that they guess - remember rainbow tables will not be being used here because the passwords are salted. The attacker will be hashing each password guess in line with your chosen algorithm and iteration count.

It is important to tune the number of iterations for your system just so the right "strength" is used not to cause legitimate users any real annoyance when logging into your system. This is known as number of "rounds" or the "iteration count". For example, iterating for about one second should be sufficient. It may be safe to assume that an attacker can run through hashes at ten times the speed of your system's hardware. Therefore this limits the attacker to 10 guesses per second, rather than two billion with MD5.

Regarding DoS attacks

Yes, the extra processing your application performs upto login could be a target for an attacker to submit either really long passwords to your application, or to repeatedly hit it with login requests in order to consume CPU and memory resources on your server. You are right to be concerned.

These type of attacks can be mitigated in the following ways:

  • Log the username and IP address of each login attempt. After say 6 failed attempts, introduce a delay in response from your application if that username or IP is repeated again. This will also help mitigate password guessing attacks in general.
    • For example you could artificially delay by 1 second, then 2 seconds then 4 and up to a reasonable value (e.g. 16 seconds).
    • This has the advantage that an attacker can't lock out another account on purpose as the legitimate user only has to wait 16 seconds.
    • An attacker could use a botnet and random usernames to bypass these checks, however they would need a huge number of IP addresses than they would without this control, and a more casual attacker would also not be aware that the delay in response was artificial.
  • Monitor the number of login attempts on your system. Once this is above a set threshold rate (e.g. 10 per second), introduce a CAPTCHA to solve to continue the login process. The threshold rate you choose very much depends on the user base and capacity of your system.
  • Implement two factor authentication. Only proceed to validate password by hashing once the One Time Password has been validated.


回答2:

The problem with MD5 is exactly that it is so fast, you can calculate about 9 Giga MD5/s with common hardware. To brute-force a whole english dictionary with about 200000 words you need only a fraction of a milli-second.

This is why appropriate hash algorithms like BCrypt offer a cost factor. The cost factor defines how much time is needed to calculate the hash and can be inreased in future. 50 milliseconds for a login is hardly an obstacle, but for brute-forcing it is deadly.



回答3:

You speak of slowing down validation as a problem but it is the only defense against a leaked hash and a brute force attack. Modern solutions hash the value repeatedly (ie: thousands of times) just to raise the cost of the calculation.