I am using hive 0.13.1 and hashing combination of keys using default hive hash function.
Something like select hash (date,token1,token2, parameters["a"],parameters["b"], parameters["c"]) from table1;
I ran it on 150M rows. For 60% of the rows, it hashed it correctly. For the remaining rows, it gave 0. null or 1 as hash. I looked at the rows which resulted in bad hashes, I don't see anything wrong with the rows. What could be causing it?
The hash function returns 0 only when all supplied arguments are blank or null.
If you are familiar with Java then you may check implementation of hash function.
The hash function internally uses
ObjectInspectorUtils.hashCode
to get the hashCode for the supplied fields, use below java code snippet to test manually this issue:Maven dependencies required to run above program: