How to find the most recent partition in HIVE tabl

2019-03-26 07:39发布

I have a partitioned table - with 201 partitions. I need to find latest partition in this table and use it to post process my data. The query to find list of all partitions is :

use db;
show partitions table_name; 

I need a query to find the latest of these partitions. The partitions are in format

ingest_date=2016-03-09

I tried using max() which gave me a wrong result. I do not want to traverse through entire table by doing

select max(ingest_date) from db.table_name; 

This would give me the expected output.. but kill the whole point of having partitions in the 1st place.

Is there a more efficient query to get the latest partition for HIve table ?

2条回答
叛逆
2楼-- · 2019-03-26 07:47

You can use "show partitions":

hive -e "set hive.cli.print.header=false;show partitions table_name;" | tail -1 | cut -d'=' -f2

This will give you "2016-03-09" as output.

查看更多
不美不萌又怎样
3楼-- · 2019-03-26 07:51

If you want to avoid running the "show partitions" in hive shell as suggested above, you can apply a filter to your max() query. That will avoid doing a fulltable scan and results should be fairly quick!

select max(ingest_date) from db.table_name where ingest_date>date_add(current_date,-3) will only scan 2-3 partitions.

查看更多
登录 后发表回答