Hive query not using index

2019-06-23 23:36发布

I am analyzing the impact of using an index on hive table. I created a table with 5 columns (COL1,COL2,COL3,COL4,COL5) and loaded 100000 rows in it. I also created an index on COL1 on this table.

I ran select * with WHERE clause on COL1 which is an index column.

I see no improvement in query run-time compared to when I ran the same query before creating the index.

I did an EXPLAIN on my select query and it shows TableScan instead of IndexScan and I am unable to figure out why it's not using the index.

Please help.

标签： hadoop hive

1条回答

劳资没心，怎么记你

2楼-- · 2019-06-23 23:54

You can check this and this but basically it is as following;

Create the index

CREATE INDEX .. ON TABLE...
Build the index

ALTER INDEX .. ON .. REBUILD;
Use the index

INSERT OVERWRITE DIRECTORY '/tmp/indexes/..' SELECT _bucketname, _offsets FROM default__t_..__...

SET hive.index.compact.file=/tmp/indexes/x;

SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;

SELECT ... from ... where ... group by ...;

Hope it helps

0人赞添加讨论(0) 举报

Hive query not using index

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间