Aggregate queries fail in hive if partition direct

2019-09-08 18:17发布

问题:

I am using Hive v1.2.1 with Tez. I have an external partitioned table. The partitions are hourly and of the form p=yyyy_mm_dd_hh. The situation is that these partition directories in hdfs are likely to be deleted sometime. After they are deleted, hive still contains the metadata for that partition, and a command 'show partitions ' would still list the partition whose directory was deleted from hdfs. Normally, this is not likely to cause any problem, and a select query for the partition(whose directory was deleted) would simply result an empty resultset:

hive> select * from test_tab where p='2015_01_01_01';
OK
Time taken: 2.168 seconds

However, on running any aggregate query against the same partition, I get an error:

hive> select count(*) from test_tab where p='2015_01_01_01';
FAILED: SemanticException java.io.FileNotFoundException: File hdfs://localhost:8020/user/root/data/test_db/test_tab/p=2015_01_01_01 does not exist.

I need to have the same behavior in aggregate queries as that in other select queries. This is probably a bug in hive. Any workaround-hints for this issue would be appreciated. Best Regards.

回答1:

run below command

msck repair table test_tab;

and then run your query