What kind of hashing algorithm is used in the built-in HASH() function?
I'm ideally looking for a SHA512/SHA256 hash, similar to what the SHA() function offers within the linkedin datafu UDFs for Pig.
What kind of hashing algorithm is used in the built-in HASH() function?
I'm ideally looking for a SHA512/SHA256 hash, similar to what the SHA() function offers within the linkedin datafu UDFs for Pig.
As of Hive 2.1.0 there is a
mask_hash
function that will hash string values.For Hive 2.x it uses md5 as the hashing algorithm. This was changed to sha256 for Hive 3.x
HASH
function (as of Hive 0.11) uses algorithm similar to java.util.List#hashCode.Its code looks like this:
Basically it's a classic hash algorithm as recommended in the book Effective Java. To quote a great man (and a great book):
I digress. You can look at the
HASH
source here.If you want to use SHAxxx in Hive then you can use Apache DigestUtils class and Hive built-in
reflect
function (I hope that'll work):