I was trying to optimize a hive SQL by bucketing the data on a single column. I created the table with following statement
CREATE TABLE `source_bckt`(
`uk` string,
`data` string)
CLUSTERED BY(uk) SORTED BY(uk) INTO 10 BUCKETS
Then inserted the data after executing "set hive.enforce.bucketing = true;"
When I run the following select "select * from source_bckt where uk='1179724';"
Even though the data is supposed to be in a single file which can be identified by the following equation HASH('1179724')%10
the mapreduce spawned scans through the entire set of files.
Any idea?