Can i move data from one hive partition to another

2019-08-19 05:45发布

My partition is based on year/month/date. Using SimpleDateFormat for week year created a wrong partition . The data for the date 2017-31-12 was moved to 2018-31-12 using YYYY in the date format.

   SimpleDateFormat sdf = new SimpleDateFormat("YYYY-MM-dd");

So what I want is to move my data from partition 2018/12/31 to 2017/12/31 of the same table. I did not find any relevant documentation to do the same.

2条回答
放荡不羁爱自由
2楼-- · 2019-08-19 06:09

There is a JIRA related to that https://issues.apache.org/jira/browse/SPARK-19187. Upgrade your spark version to 2.0.1 should fix the problem

查看更多
够拽才男人
3楼-- · 2019-08-19 06:17

From what I understood, you would like to move the data from 2018-12-31 partition to 2017/12/31. Below is my explanation of how you can do it.

#From Hive/Beeline
ALTER TABLE TableName PARTITION (PartitionCol=2018-12-31) RENAME TO PARTITION (PartitionCol=2017-12-31);

FromSparkCode, You basically have to initiate the hiveContext and run the same HQL from it. You can refer one my answer here on how to initiate the hive Context.

#If you want to do on HDFS level, below is one of the approaches
#FromHive/beeline run the below HQL
ALTER TABLE TableName ADD IF NOT EXISTS PARTITION (PartitionCol=2017-12-31);

#Now from HDFS Just move the data in 2018 to 2017 partition
hdfs dfs -mv /your/table_hdfs/path/schema.db/tableName/PartitionCol=2018-12-31/* /your/table_hdfs/path/schema.db/tableName/PartitionCol=2017-12-31/

#removing the 2018 partition if you require
hdfs dfs -rm -r /your/table_hdfs/path/schema.db/tableName/PartitionCol=2018-12-31

#You can also drop from beeline/hive
alter table tableName drop if exists partition (PartitionCol=2018-12-31);

#At the end repair the table
msck repair table tableName

Why do i have to repair the table ??

查看更多
登录 后发表回答