Hive scanning entire data for bucketed table

2020-08-01 05:36发布

站内文章 / 移动开发

98 0

疯言疯语

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I was trying to optimize a hive SQL by bucketing the data on a single column. I created the table with following statement

CREATE TABLE `source_bckt`(
  `uk` string, 
  `data` string)
CLUSTERED BY(uk) SORTED BY(uk) INTO 10 BUCKETS

Then inserted the data after executing "set hive.enforce.bucketing = true;"

When I run the following select "select * from source_bckt where uk='1179724';" Even though the data is supposed to be in a single file which can be identified by the following equation HASH('1179724')%10 the mapreduce spawned scans through the entire set of files.

Any idea?