How to avoid generating empty .deflate files for a

2019-05-12 01:35发布

问题:

When I run a Hive query, a large number of empty .deflate files are generated (they are actually about 8 bytes, which I think is the minimum size for a .deflate file). I suspect this is happening because the query requires a large number of reducers. I am wondering if there is a way to avoid generating these empty .deflate files?

Thanks in advance,

Lin

回答1:

.deflate is the default compression codec

There are compression settings for Hive that can be used to reduce the amount of disk space that Hive uses for its queries.

When the property hive.exec.compress.output=true, Hive will use the codec configured by the mapred.map.output.compression.codec property to compress the storage in HDFS. These properties can be set in the hive.site.xml or in the Hive-CLI.

To enable output compression from Hive-CLI.:

hive> set hive.exec.compress.output=true;

To enable output compression using hive.site.xml

<property>
 <name>hive.exec.compress.output</name>
 <value>true</value>
</property>

So to disable the .deflate file:

set hive.exec.compress.output=false;



标签: hadoop hive