I have just read many, many articles on SO about hashing passwords with salt but I just cannot find an answer to the particular query/confusion I have.
Let's say I have just done this method for adding a password and salt to the DB:
- Create a random salt
- Hash the users password + salt together
- Store the hash output as the password in column 'password'
- Store the random salt in column 'salt'
If this is correct, what happens when an attacker gets access to my DB? Surely if they can read what the salt value is for that hashed password, they can work out what the hashed password is without the salt and then use a rainbow table? Or is it a good idea to encrypt the salt value too with something that is reversible?
If the salt value is stored in plain-text, I just cannot see the point of it. Please enlighten me?
Let's say you didn't use a salt and an attacker got your hashes. All she'd need to do to is compare the hashes to a lookup table and see if any of the hashes are for known passwords. Let's say the table has a million passwords in it. She can very efficiently check all your hashes against a million possible passwords.
Now let's say the same attacker got your hashes, but they are salted. For each hash she wants to examine, she'll need to take the candidate password, apply the salt, compute a new hash, and compare it to the hash you have stored. Now she has to do a ton of calculations and it's not as efficient. (Alternatively, she could have a lookup table with every possible salt in it, but OK, then she needs to have a lookup table that is orders of magnitude larger than the one without salts.)
It's all about making the amount of resources required to crack the hashes more than it's worth to the attacker.
The steps you outline are correct.
If the attacker accesses your database, he has to do a brute force search of the possible passwords plus the random salt. If you use a 64-bit reasonably random salt, then there won't be two entries using the same salt, so any rainbow table attack only works for (at most) one salt value at a time, which makes the rainbow table attack too expensive to be worthwhile. (You can even check to ensure that there is no other password using a given salt when you establish the salt for a user.)
The point of the salted hashed password process is to make it computationally infeasible to precompute possible password hashes, because the random salt screws up the precomputation process.
It also means that if the same password is used at different sites, it won't be obvious by simply looking at the (salted hashed) password values - because the salts will be different at the different sites, so the resulting hash value will be different. (Of course, if the password is discovered for one site, then the attacker will try that password first at the next site; it is still best not to use the same password in multiple locations. But the fact that the same password is in use is hidden.)
If the attacker gets to the database then all bets are off, but as far as salt ...
The point of a salt is not to be secret but rather to thwart rainbow attacks -- a rainbow attack is one which is done through a rainbow table. And a rainbow table is just pre-generated hashes of millions and millions of passwords (it's a space-time tradeoff). The introduction of a salt invalidates these precomputed hashes: a rainbow table must be made for each unique salt.
So...
Now, if the attacker is assumed to have the database then there is another problem: the attack speed is not limited, and this is why a password hashing scheme like bcrypt or multiple-rounds is valuable. It can slow down the attack-speed from hundreds of millions of hashes per second (MD/SHA and friends were made to be fast) to, say, a few hundred per second (on the same hardware)... Also consider an HMAC approach, which also incorporates a server-secret (making it effectively password+salt+secret).
(I would just use an existing system that already addresses all these issues, and more :-)
Happy coding.