I'm storing unique user-agents in a MySQL MyISAM table so when I have to look if it exists in the table, I check the md5 hash that is stored next to the TEXT field.
User-Agents
{
id - INT
user-agent - TEXT
hash - VARCHAR(32) // md5
}
There is any way to do the same but using a 32-bit integer and not a text hash? Maybe the md5 in raw format will be faster? That will requiere a binary search.
[EDIT]
MySQL don't handle hash searches for complete case-sensitive strings?
You could do this instead:
It's unlikely that you'll get collisions with crc32 for this kind of data.
To guarantee that collisions will not cause problems, add a secondary search parameter. MySQL will be able to use the index to quickly find the record. Then it can do a simple string search to guarantee that match is correct.
PS: The sprintf() is there to work around signed 32-bit integers. Should be unnecessary on 64-bit systems.
You can't store an MD5 hash in a 32-bit int: it simply won't fit. (It's 32 characters when written in hex, but it's 128-bits of data)
You could look at MySQL's
BINARY
andVARBINARY
types. See http://dev.mysql.com/doc/refman/5.1/en/binary-varbinary.html. These types store binary data. In your case,BINARY(16)
orVARBINARY(16)
, but since MD5 hashes are always 16 bytes, the latter seems a bit pointless.Store the
UNHEX(MD5($value))
in aBINARY(16)
.