I'm looking to create a 32-bit hash of some data objects. Since I don't feel like writing my own hash function and md5 is available, my current approach is to use the first 32 bits (i.e. first 8 hex digits) from an md5 hash. Is this acceptable?
In other words, are the first 32 bits of an md5 hash just as "random" as any other substring? Or is there any reason I'd prefer, say, the last 32 bits? or perhaps XOR'ing the four 32-bit substrings together?
Some preemptive clarifications:
- These hashes don't need to be cryptographically secure.
- I'm not concerned with the performance of md5--it is more than fast enough for my needs.
- These hashes just need to be "random" enough that collisions are rare.
- In this system, the number of items shouldn't exceed 10,000 (realistically it's probably not going to get half that high). So in the worst case the probability of encountering any collisions at all should be about 1% (assuming a sufficiently "random" hash is found).
An old question here but it comes up often. The answer is most certainly NO, otherwise an MD5 string wouldn't need to be more than 32 bits long.
Regardless, an MD5 string isn't random at all - it's entirely and consistently reproducible given the same input (which is pretty much the anti-random ;-)).
Whether or not it is sufficiently unique for your purposes depends on your purpose.
For any good hash function the individual bits should be approximately random. You should therefore be safe to use just the first 32 bits of an MD5 hash.
Alternatively you could also use CRC32 which should be much faster to compute (and the code is about 20 lines).
Yes. If the answer were no, MD5 wouldn't be sufficiently secure. (sure, it has some minor cryptographic weaknesses but I'm not aware of any statistical ones)