I am analyzing the impact of using an index on hive table. I created a table with 5 columns (COL1
,COL2
,COL3
,COL4
,COL5
) and loaded 100000 rows in it. I also created an index on COL1 on this table.
I ran select *
with WHERE
clause on COL1
which is an index column.
I see no improvement in query run-time compared to when I ran the same query before creating the index.
I did an EXPLAIN on my select query and it shows TableScan instead of IndexScan and I am unable to figure out why it's not using the index.
Please help.
You can check this and this but basically it is as following;
Create the index
CREATE INDEX .. ON TABLE...
Build the index
ALTER INDEX .. ON .. REBUILD;
Use the index
INSERT OVERWRITE DIRECTORY '/tmp/indexes/..' SELECT
_bucketname
,_offsets
FROM default__t_..__...SET hive.index.compact.file=/tmp/indexes/x;
SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
SELECT ... from ... where ... group by ...;
Hope it helps