What is the way to automatically update the metadata of Hive partitioned tables?
If new partition data's were added to HDFS (without alter table add partition command execution) . then we can sync up the metadata by executing the command 'msck repair'.
What to be done if a lot of partitioned data were deleted from HDFS (without the execution of alter table drop partition commad execution).
What is the way to syncup the Hive metatdata?
Ensure the table is set to external, drop all partitions then run the table repair:
If msck repair throws an error, then run hive from the terminal as:
hive --hiveconf hive.msck.path.validation=ignore
or
set hive.msck.path.validation=ignore;
Try using
As correctly stated by HakkiBuyukcengiz,
MSCK REPAIR
doesn't remove partitions if the corresponding folder on HDFS was manually deleted, it only adds partitions if new folders are created.Extract from offical documentation :
This is what I usually do in the presence of
external
tables if multiple partitions folders are manually deleted on HDFS and I want to quickly refresh the partitions :DROP TABLE table_name
) (dropping an external table does not delete the underlying partition files)CREATE EXTERNAL TABLE table_name ...
)MSCK REPAIR TABLE table_name
)Depending on the number of partitions this can take a long time. The other solution is to use
ALTER TABLE DROP PARTITION (...)
for each deleted partition folder but this can be tedious if multiple partitions were deleted.