How to merge small files from existing partitions

2019-07-26 05:01发布

问题:

How to merge existing Partition small files into one large file in one of the Partition .

For example I have a table user1, it contain columns fname,lname and partition column is day.

I have created table by using below script

CREATE TABLE user1(fname string,lname string) parittioned By (day int);

After inserting data into partion table it will look like below.

 fname  lname  day
.....................
AA      AAA   20170201     ....>partition 20170201
BB      BBB   20170201
...................
CC      CCC   20170202    ......>partition 20170202
DD      DDD   20170202
....................
EE      EEE   20170203    .......>partition 20170203
FF      FFF   20170203
.......................
GG      GGG   20170204    ........>partition 20170204         
HH      HHH   20170204
.......................

When I execute select query with the help of partition column i.e. day=20170201.

select * from user1 where day=20170201;

It will give result like below

AA      AAA   20170201
BB      BBB   20170201

based on above table i want to merge the all small files i.e day =20170201 and day =20170202 and day=20170203 into partition day=20170203 in my partition table (i.e USer1).i.e. It should look like below.

fname  lname  day
.....................
AA      AAA   20170201
BB      BBB   20170201
CC      CCC   20170202    
DD      DDD   20170202
E       EEE   20170203    .......>partition 20170203
FF      FFF   20170203
.......................
GG      GGG   20170204    ........>partition 20170204         
HH      HHH   20170204
.......................

can you please suggest on this,How can I achieve this?

Thanks in Advance.

回答1:

  1. Create new table partitioned by new field partition_day:
CREATE TABLE user_new(fname string,lname string, day int) parittioned By (partition_day int);
  1. Load data into new table (define your conditions for new partitionsin the case )
   insert overwrite table user_new partition (partition_day)
    select fname,lname, day,
           case when day <= 20170203 then 20170203
                when day >  20170203 then 20170204
           end as partition_day
      from user1 ;