Hive query not using index

2019-06-23 23:36发布

I am analyzing the impact of using an index on hive table. I created a table with 5 columns (COL1,COL2,COL3,COL4,COL5) and loaded 100000 rows in it. I also created an index on COL1 on this table.

I ran select * with WHERE clause on COL1 which is an index column.

I see no improvement in query run-time compared to when I ran the same query before creating the index.

I did an EXPLAIN on my select query and it shows TableScan instead of IndexScan and I am unable to figure out why it's not using the index.

Please help.

标签: hadoop hive
1条回答
劳资没心,怎么记你
2楼-- · 2019-06-23 23:54

You can check this and this but basically it is as following;

  1. Create the index

    CREATE INDEX .. ON TABLE...

  2. Build the index

    ALTER INDEX .. ON .. REBUILD;

  3. Use the index

    INSERT OVERWRITE DIRECTORY '/tmp/indexes/..' SELECT _bucketname, _offsets FROM default__t_..__...

    SET hive.index.compact.file=/tmp/indexes/x;

    SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;

    SELECT ... from ... where ... group by ...;

Hope it helps

查看更多
登录 后发表回答