When I run a Hive query, a large number of empty .deflate
files are generated (they are actually about 8 bytes, which I think is the minimum size for a .deflate
file). I suspect this is happening because the query requires a large number of reducers. I am wondering if there is a way to avoid generating these empty .deflate
files?
Thanks in advance,
Lin
.deflate
is the defaultcompression codec
There are compression settings for
Hive
that can be used to reduce the amount of disk space thatHive
uses for itsqueries
.When the property
hive.exec.compress.output=true
,Hive
will use thecodec
configured by themapred.map.output.compression.codec
property to compress the storage inHDFS
. These properties can be set in thehive.site.xml
or in theHive-CLI
.To enable output compression from
Hive-CLI
.:hive> set hive.exec.compress.output=true;
To enable output compression using
hive.site.xml
So to disable the
.deflate
file:set
hive.exec.compress.output=false;