I have a client that's running an aggregator of information from multiple accounts. The database needs to store usernames and password to other websites in a way that can be used later by a script to log into those websites to retrieve data.
Rather than store them as plain text, I'm thinking we can hash them for storage. Obviously, someone could still access the plain text version if they had access to both the code and the database, but not if they only had one or the other.
Any better ideas?
If your system has a password, you can use it to generate a key to encrypt/decrypt the passwords for the other websites.
With that, you require your user entering that password to be able to decrypt the passwords you have on your database.
A more detailed flow here:
- User enters password "123456" when login into your system.
- You use the SHA256 of the "123456" password and get a key: "8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92"
- Use that "8d969..." key to decrypt with AES the passwords of the sites in your database.
You can refine this in a number of ways. For instance: salting the password before computing the SHA hash.
As a practical sample on such salting, Michael suggest using PBKDF2 with HMAC−SHA-256 as its two parameters pseudo random function.
Other enhancement: storing an encrypted version of the key to allow your user to change its own password without having to re-encrypt all his passwords... etc...
Anything you can do to use your credentials, an attacker can as well
What does obfuscation actually buy you? Time. The data is there after all, as is your scheme for using it yourself. It's only a matter of time for someone to figure out what your scheme is. This all depends on your level of paranoia as well as an assessment of the level of risk that you're comfortable with. In particular, how you store any credentials should depend on who has access to the machine running the database.
Eventually rubber has to hit the road though, so go forth and conquer. Don't completely knock obfuscation. Just make to sure to couple it with smart practices.
Approaches and Suggestions
Generate per-application API Keys for each account, to be used by the machine
If you can generate API Keys from the third party accounts, this would give you the ability to revoke access to the accounts without shutting all potential applications out. A lot of services have these types of API Keys (Google, Twitter, StackExchange, Facebook and many others).
You simply setup an "application", then use a consumer key and secret as well as an access token and access secret. The machine just has to store these credentials. If a compromise happens, you just have to revoke that set of keys. Additionally, these allow you to specify the permissions per account.
Use a per user password for their set of credentials
When a user logs in, only then do you unlock their set of passwords. To do this you would generate a key based on a proper hashing scheme and a verification check that occurs several hashing steps ahead of the key.
Encrypt it on disk anyway
You could always encrypt the credentials with one key. Then you only have one key to protect (that protects all the other secrets). You would then have to decrypt before accessing your other credentials.
Store the secrets in the system's keyring
On Linux, use the gnome-keyring. Then you can make simple Create-Read-Update-Delete calls, treating it as a password database. The Gnome keyring is based on the PKCS#11 standard.
The gnome-keyring has an API for saving to the keyring and retrieving items.
/* A callback called when the operation completes */
static void
found_password (GnomeKeyringResult res, const gchar* password, gpointer user_data)
{
/* user_data will be the same as was passed to gnome_keyring_find_password() */
// ... do something with the password ...
/* Once this function returns |password| will be freed */
}
static void
find_my_password()
{
gnome_keyring_find_password (GNOME_KEYRING_NETWORK_PASSWORD, /* The password type */
found_password, /* callback */
NULL, NULL, /* User data for callback, and destroy notify */
"user", "me",
"server", "gnome.org",
NULL);
}
On Windows 7+, use the "Encrypted File System" (EFS) feature. All the files are encrypted with a certificate with is in turn protected with your Windows password.
Don't let this lull you into a false sense of security though. If this is a server that it's running on, if someone gets network access to the box they have access to the keyring data themselves as well.
Set up a remote machine that grants access
Can you set up a machine that would grant access to credentials or an unlock key using public and private key pairs?
On hashing
If you hash the usernames and passwords, you're not getting them back. Hashes are designed to be one-way functions.
You could encode the data for obfuscation though, but I'm not recommending that.
If you hash the information, you cannot retrieve it later. If you encrypt it, you need to store the key somewhere. There is no reliable way, except for physically restricting access to the database, to eliminate the potential for malicious use of the data.
Hashing can prevent, in all foreseeable manners, use of the original data. However, you need to use the data. Cryptographic hashes like SHA-256 are designed so that it is computationally difficult (without making unreasonably sized lookup tables) to find m
given H(m)
where H
is your favorite hash function.
If you go the route of encryption, you need to store the encryption key and it can be compromised or at least used as a decryption oracle. You can make a service broker that does decryption for you and use both client and server authentication certificates to ensure safety. However, if someone compromises an authorized client, then you have a window of time between compromise and detection where accounts can be compromised. But this approach gives you the flexibility to revoke certificates and immediately deny the server access, even if you don't have access to the compromised client anymore.
I recommend setting up a remote service that is only available by a direct connection (on the same physical switch) and which authenticates itself to the client and requires all clients to authenticate. Perhaps limiting the number of queries it can make would also help in preventing abuse if a client were to be compromised. The service will need to check for certificate revocation on every request.
This service also needs to be connected to a remote logging facility, which will serve to audit the system independently. This logging service needs to again authenticate the client and authenticate itself with a client. The logging service receives data and appends it to a log, it never allows for modification or deletion. When it receives a log entry, it digitally signs a timestamped log entry and enters it into an auditing container.
This is similar to how certificate authorities set up their paper trails to audit certificate issuance, in order to provide the best possible recovery model for a compromise, since preventing the compromise is actually impossible.