My partition is based on year/month/date. Using SimpleDateFormat for week year created a wrong partition . The data for the date 2017-31-12 was moved to 2018-31-12 using YYYY in the date format.
SimpleDateFormat sdf = new SimpleDateFormat("YYYY-MM-dd");
So what I want is to move my data from partition 2018/12/31 to 2017/12/31 of the same table. I did not find any relevant documentation to do the same.
From what I understood, you would like to move the data from 2018-12-31 partition to 2017/12/31. Below is my explanation of how you can do it.
#From Hive/Beeline
ALTER TABLE TableName PARTITION (PartitionCol=2018-12-31) RENAME TO PARTITION (PartitionCol=2017-12-31);
FromSparkCode, You basically have to initiate the hiveContext and run the same HQL from it. You can refer one my answer here on how to initiate the hive Context.
#If you want to do on HDFS level, below is one of the approaches
#FromHive/beeline run the below HQL
ALTER TABLE TableName ADD IF NOT EXISTS PARTITION (PartitionCol=2017-12-31);
#Now from HDFS Just move the data in 2018 to 2017 partition
hdfs dfs -mv /your/table_hdfs/path/schema.db/tableName/PartitionCol=2018-12-31/* /your/table_hdfs/path/schema.db/tableName/PartitionCol=2017-12-31/
#removing the 2018 partition if you require
hdfs dfs -rm -r /your/table_hdfs/path/schema.db/tableName/PartitionCol=2018-12-31
#You can also drop from beeline/hive
alter table tableName drop if exists partition (PartitionCol=2018-12-31);
#At the end repair the table
msck repair table tableName
Why do i have to repair the table ??
There is a JIRA related to that https://issues.apache.org/jira/browse/SPARK-19187. Upgrade your spark version to 2.0.1 should fix the problem