When I run a Hive query, a large number of empty .deflate
files are generated (they are actually about 8 bytes, which I think is the minimum size for a .deflate
file). I suspect this is happening because the query requires a large number of reducers. I am wondering if there is a way to avoid generating these empty .deflate
files?
Thanks in advance,
Lin
.deflate
is the default compression codec
There are compression settings for Hive
that can be used to reduce the amount of disk space that Hive
uses for its queries
.
When the property hive.exec.compress.output=true
, Hive
will use the codec
configured by the mapred.map.output.compression.codec
property to compress the storage in HDFS
. These properties can be set in the hive.site.xml
or in the Hive-CLI
.
To enable output compression from Hive-CLI
.:
hive> set hive.exec.compress.output=true;
To enable output compression using hive.site.xml
<property>
<name>hive.exec.compress.output</name>
<value>true</value>
</property>
So to disable the .deflate
file:
set hive.exec.compress.output=false;