Hashing of pointer values

2020-08-09 05:08发布

问题:

Sometimes you need to take a hash function of a pointer; not the object the pointer points to, but the pointer itself. Lots of the time, folks just punt and use the pointer value as an integer, chop off some high bits to make it fit, maybe shift out known-zero bits at the bottom. Thing is, pointer values aren't necessarily well-distributed in the code space; in fact, if your allocator is doing its job, there's an excellent chance they're all clustered close together.

So, my question is, has anyone developed hash functions that are good for this? Take a 32- or 64-bit value that's maybe got 12 bits of entropy in it somewhere and spread it evenly across a 32-bit number space.

回答1:

This page lists several methods that might be of use. One of them, due to Knuth, is a simple as multiplying (in 32 bits) by 2654435761, but "Bad hash results are produced if the keys vary in the upper bits." In the case of pointers, that's a rare enough situation.

Here are some more algorithms, including performance tests.

It seems that the magic words are "integer hashing".



回答2:

They'll likely exhibit locality, yes - but in the lower bits, which means objects will be distributed through the hashtable. You'll only see collisions if a pointer's address is a multiple of the hashtable's length from another pointer.



回答3:

If you know the lowest possible pointer address (which is often the case if you're working within a large buffer), just convert the pointer to an integer by subtracting the lowest possible pointer value; eg. that could be the buffer's base address. -Remember: pointer subtracted from pointer equals an offset (integer). So: Don't "chop off" bits; it's much better to convert to an offset. This will result in that the offset value is much smaller than a pointer value. It may help further to shift the pointer value right twice (eg. divide by 4) in some cases as well, before hashing it. The problem with pointers is often that small blocks of memory is likely to be allocated on the same address (eg. a block being freed and another block is taking the freed block's place).



回答4:

Why not just use an existing hash function?