Create Table in Hive with one file

2019-01-20 17:40发布

I'm creating a new table in Hive using:

CREATE TABLE new_table AS select * from old_table;

My problem is that after the table is created, It generates multiple files for each partition - while I want only one file for each partition.

How can I define it in the table? Thank you!

标签： hive create-table

1条回答

成全新的幸福

2楼-- · 2019-01-20 18:19

There are many possible solutions:

1) Add distribute by partition key at the end of your query. Maybe there are many partitions per reducer and each reducer creates files for each partition. This may reduce the number of files and memory consumption as well. hive.exec.reducers.bytes.per.reducer setting will define how much data each reducer will process.

2) Simple, quite good if there are not too much data: add order by to force single reducer. Or increase hive.exec.reducers.bytes.per.reducer=500000000; --500M files. This is for single reducer solution is for not too much data, it will run slow if there are a lot of data.

If your task is map-only then better consider options 3-5:

3) If running on mapreduce, switch-on merge:

set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.merge.size.per.task=500000000;  --Size of merged files at the end of the job
set hive.merge.smallfiles.avgsize=500000000; --When the average output file size of a job is less than this number, 
--Hive will start an additional map-reduce job to merge the output files into bigger files

4) When running on Tez

set hive.merge.tezfiles=true; 
set hive.merge.size.per.task=500000000;
set hive.merge.smallfiles.avgsize=500000000;

5) For ORC files you can merge files efficiently using this command: ALTER TABLE T [PARTITION partition_spec] CONCATENATE; - for ORC

0人赞添加讨论(0) 举报

Create Table in Hive with one file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间