How hashing works in bucketing for hive?

2020-04-21 04:24发布

I know the hashing principal for HashMap in Java, so wanted to know that how the hashing works for the Hive while we bucketing the data in various bucket.

标签： hadoop hive hiveql

2条回答

该账号已被封号

2楼-- · 2020-04-21 04:42

I recently had to dig into some Hive source code to figure this out for myself. Here's what I found:

For an integer field, the hash is just the integer value. For a string, it uses a similar version of Java's String hashCode. When hashing multiple values, the hash is a similar version of Java’s List hashCode.

0人赞添加讨论(0) 举报

叛逆

3楼-- · 2020-04-21 04:50

Bucketing is used along with partitioning to have more decomposed structure for future analysis. As more partitions result in more hdfs files which can affect namenode performance, we resort to bucketing. The way bucketing actually works is : The number of buckets is determined by hashFunction(bucketingColumn) mod numOfBuckets numOfBuckets is chose when you create the table with partitioning. The hash function output depends on the type of the column choosen. To accurately set the number of reducers while bucketing and land the data appropriately, we use "hive.enforce.bucketing = true". Please refer to this, for more information

0人赞添加讨论(0) 举报

How hashing works in bucketing for hive?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间