I'm storing unique user-agents in a MySQL MyISAM table so when I have to look if it exists in the table, I check the md5 hash that is stored next to the TEXT field.
User-Agents
{
id - INT
user-agent - TEXT
hash - VARCHAR(32) // md5
}
There is any way to do the same but using a 32-bit integer and not a text hash? Maybe the md5 in raw format will be faster? That will requiere a binary search.
[EDIT]
MySQL don't handle hash searches for complete case-sensitive strings?
If your field
hash
is always an MD5 value generated by PHP, then you can safely set it toCHAR(32)
. This should not impact the response time to your queries, unless you plan to have millions+ of rows, or even worst!JOIN
other tables with this field. The bottom line is that fixed width column is better than variable ones, so if you can optimize do it.Regarding changing MD5 into int values, see this question; the conclusion to this is that if you really want to change your MD5 into a 128-bit int value, you might as well use a random number instead of an MD5!
Have you tried creating a
BINARY(16)
field, and storing the result ofmd5($plaintext, true);
in it? That might work, make sure you index that field as well.Because trying to fit a 128-bit value in 32 bits doesn't make any sense...
Let MySQL do the hard work for you. Use a CHAR column and create an index on that column. You could convert and store the hash as an integer, but there's absolutely no benefit, and it may actually cause problems.
Are you REALLY sure the hashes are only 32-bit? MD5 is 128-bit. Cropping the hash to first 4 or 8 bytes would greatly increase risk of collisions.
try MurmurHash. Its a fast hashing algo thats been translated to multiple languages. It takes your input and translates it into a 32/64 bit integer hash.
You can store MD5 hash in char(32) which is a bit faster than varchar(32). It's also possible to make two BIGINT fields and keep first half of md5 hash in first field and second part in second field.