I want to calculate hash for strings in hive without writing any UDF only using exisiting functions . So that I can use similar approach to get consistent hash in other languages. for ex : are there any functions using which I can do something like adding characters or taking Xor.
相关问题
-
hive: cast array
> into map - Find function in HIVE
- Hive Tez reducers are running super slow
- Set parquet snappy output file size is hive?
- Hive 'cannot alter table' error
相关文章
- 在hive sql里怎么把"2020-10-26T08:41:19.000Z"这个字符串转换成年月日
- SQL query Frequency Distribution matrix for produc
- Cloudera 5.6: Parquet does not support date. See H
- converting to timestamp with time zone failed on A
- Hive error: parseexception missing EOF
- ClassNotFoundException: org.apache.spark.SparkConf
- How to get previous day date in Hive
- Hive's hour() function returns 12 hour clock v
It depends on the version of Hive, cf. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Misc.Functions
select XYZ, hash(XYZ) from ABC
has been available for years and applies plain old
java.lang.String.hashCode()
, returning an INT (32 bit hash)[Edit 2] Actually it's a bit more complex since
hash()
accepts a list of arguments of any type (incl. primitive types that have no built-in hashing method), so a custom approach is used -- checkObjectInspectorUtils.hashCode()
andObjectInspectorUtils.getBucketHashCode()
in the source code here (for V2.1)select XYZ, crc32(XYZ) from ABC
requires Hive 1.3 and applies plain old Cyclic Redundancy Check (probably via
java.util.zip.CRC32
), returning a BIGINT (32 bit hash)select XYZ, md5(XYZ), sha1(XYZ), sha2(XYZ,256), sha2(XYZ,512) from ABC
requires Hive 1.3 and applies strong, cryptographic hash functions, returning a STRING with the hexadecimal representation of the binary (128, 160, 256 and 512 bit hashes)
[Edit 1] the answer to that post has also a very good workaround for applying crypto hash functions with older versions of Hive, using Apache Commons static methods and
reflect()
.