Hive scanning entire data for bucketed table

2020-08-01 05:06发布

I was trying to optimize a hive SQL by bucketing the data on a single column. I created the table with following statement

CREATE TABLE `source_bckt`(
  `uk` string, 
  `data` string)
CLUSTERED BY(uk) SORTED BY(uk) INTO 10 BUCKETS

Then inserted the data after executing "set hive.enforce.bucketing = true;"

When I run the following select "select * from source_bckt where uk='1179724';" Even though the data is supposed to be in a single file which can be identified by the following equation HASH('1179724')%10 the mapreduce spawned scans through the entire set of files.

Any idea?

标签： hadoop hive hiveql

1条回答

不美不萌又怎样

2楼-- · 2020-08-01 05:55

This optimization is not supported yet.
Current JIRA ticket status is PATCH AVAILABLE

https://issues.apache.org/jira/browse/HIVE-5831

0人赞添加讨论(0) 举报

Hive scanning entire data for bucketed table

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间