Dropping hive partition based on certain condition

2019-09-16 19:56发布

I have a table in hive built using the following command:

create table t1 (x int, y int, s string) partitioned by (wk int) stored as sequencefile;

The table has the data below:

select * from t1;
+-------+-------+-------+--------+--+
| t1.x  | t1.y  | t1.s  | t1.wk  |
+-------+-------+-------+--------+--+
| 1     | 2     | abc   | 10     |
| 4     | 5     | xyz   | 11     |
| 7     | 8     | pqr   | 12     |
+-------+-------+-------+--------+--+

Now the ask is to drop the oldest partition when partition count is >=2 Can this be handled in hql or through any shell script and how?

Considering I will be using dbname as variable like hive -e 'use "$dbname"; show partitions t1

标签: shell hive
1条回答
乱世女痞
2楼-- · 2019-09-16 20:39

If your partitions are ordered by date, you could write a shell script in which you could use hive -e 'SHOW PARTITIONS t1' to get all partitions, in your example, it will return:

wk=10
wk=11
wk=12

Then you can issue hive -e 'ALTER TABLE t1 DROP PARTITION (wk=10)' to remove the first partition;

So something like:

db=mydb
if (( `hive -e "use $db; SHOW PARTITIONS t1" | grep wk | wc -l` < 2)) ; then
    exit;
fi
partition=`hive -e "use $db; SHOW PARTITIONS t1" | grep wk | head -1`;
hive -e "use $db; ALTER TABLE t1 DROP PARTITION ($partition)";
查看更多
登录 后发表回答