I have some boolean arrays that their sizes are not constant, And I need a strong and fast hash algorithm to give minimum chance of hash collision for them.
My own idea was calculating the integer value of each boolean array but for example these 2 arrays will give same hash of 3:
[0 , 1, 1] and [1, 1]
I thought to multiply the size of array after calculating integer value, but this idea also sucks, because there is a high chance of hash collision.
Does anyone has a good idea?
My ideas:
Approach #1:
Calculate the first
2n
prime numbers, wheren
is the length of the array.Let hash = 1.
For i = 0 to n: If a bit at position
i
is 1, multiplyhash
by the2i
th and2i + 1
st prime. If it's 0, multiply it by the2i
th one only.Approach #2:
Treat the binary arrays as ternary. Bit is 0 => ternary digit is 0; bit is 1 => ternary digit is 1; bit is not present => ternary digit is 2 (this former works because the array has a maximal possible length).
Calculate the ternary number using this substitution - the result will be unique.
Here's some code showing the implementation of these algorithms in C++ and a test program which generates hashes for every boolean array of length 0...18. I use the C++11 class
std::unordered_map
so that each hash is uniqued. Thus, if we don't have any duplicates (i. e. if the hash function is perfect), we should get2 ^ 19 - 1
elements in the set, which we do (I had to change the integers tounsigned long long
on IDEone, else the hashes weren't perfect - I suspect this has to do with 32 vs. 64 bit architectures):A simple an efficient hashcode is replacing 0 and 1 with prime numbers and do the usual shift-accumulator loop:
Here 0 is treated as 3 and 1 is treated as 5, so that leading zeros are not ignored. The multiplication by 31 makes sure that order matters. This isn't cryptographically strong though: given a hash code for a short sequence it's simple arithmetic to reverse it.
Insert a sentinel
true
element at the start of the array, then interpret the array as a binary number. This is a perfect hash (no collisions) for arrays with less than 32 elements. For larger arrays I suggest doing the arithmetic modulo a large prime less than 231.Examples:
This is the same as interpreting the array as a binary number and taking the bitwise OR with
1 << n
, wheren
is the size of the array.Implementation: