In a Linux system, passwords are stored using an MD5 hash. Why can the usage of "salt" protect the system more? Particularly, I want to make clear the following two
- The salt is said to be stored in
clear text with the hash, then how
it can prevent the attacker when the
attacker knows the salt value.
(Attacker can be the system
administrator himself who can check
/etc/shadow
.
- If the salt is generated randomly
everytime, how can the system
compare the hash to authenticate the
user?
For example, User A has user salt s1 and generate h1; h1 = md5(password.s1);
.
The next time, it uses salt s2 and the system must generate a different hash, h2 = md5(password.s2)
. Since h1 is not equal to h2, how can the system authenticate the user?
MD5 is a hash as you know, so if you give it an input, like 'PASSWORD', you get a unique (hopefully - however MD5 has collisions these days) output, like '3DE2AF...'.
Now, as you know, it's quite hard to directly reverse that, until somebody thought... wait, why don't I pre-generate all the possible combinations of hashable values until I can reverse the hash. This is called a rainbow table.
The purpose of a salt is to add arbitrary random data to the string being hashed, such that you increase the length of input to hash. This means general rainbow tables that expect to reverse just a password input to a hash won't work. Of course, rainbow tables being just reverse lookups, you could simply generate a rainbow table to compensate for all the possible password+salt outputs. This is where the increase in length comes into its own; because of the nature of reversing hashes, the disk space to generate reverses for very long hash inputs soon becomes infeasible. Alphanumeric rainbow tables for 6-8 characters are already a couple of Gigabytes; increase the length and character classes and you start to talk in multiples of 10GB.
Of course, if you're salting with 'PASSWORD' and you hash 'PASSWORD' you're hashing 'PASSWORDPASSWORD' which isn't that much more secure, so the choice of salt is important too. Ideally, you should use a random salt with each hashed string, but of course, you need to know what it is. A common technique is to derive a salt from the username or some other property unique to this case. Adding arbitrary data isn't in itself useful; having user-determined salt data now adds an additional level of complexity, meaning rainbow tables are needed with specialised searches for each user. The more you make this difficult, the more computational power is needed. That's where the battle is.
However, there are some modern techniques. I am not an expert, so I can't tell you how secure these are, but they are worth a mention. The concept is slow hashing. Basically, through compound hash functions you make it take a while to compute each hash. As such, the ability for each user to check the password now has a constant amount of time added for each password you wish to check. If you're bruteforcing, that is Bad News(tm). Similarly, if the system is well designed, if there are no shortcuts (which probably equate to weaknesses) then generating a rainbow table for a slow hash function should also take a while.
Edit more detail here. See crypt()
for the first example of this. @CodeInChaos has referenced PBKDF2 which forms part of PKCS#5. A newer development is scrypt.
As I say, I'm not an expert cryptanalyst. On the latter example, I have no particular specialist knowledge as to its suitability, I'm merely showing you where things are headed.
Edit 2 Clarified my write up of salt - I think I danced around the key issue of disk space before.
You can reverse a simple hash algorithm by brute force.
If you are using a common word for passwords, some prebuild tables (like rainbow ones) might contain them. That's why most algorithms call the hash function several times:
md5(md5(md5(password)));
Using salt gives a lot more of randomness to the generated password and thus make it less guessable. It consists of adding a random piece of string in the process
md5(md5(md5(password+string)+string)+string);
One reason could be, if two people use same password unknowingly they will generate same MD5. One of them can just see /etc/shadow and guess other guys password.
Now with salt added to each password, even same passwords generate different hashes.
When you encrypt data it can be still attacked by bruce-force attacks and rainbow attacks. In salting, at the end of the encrypted data you add some additional bits. So the attacker cannot get the original data properly.