How to use Snappy in Hadoop in Container format

2019-04-09 12:36发布

问题:

I have to use Snappy to compress the map o/p and the map-reduce o/p as well. Further, this should be splittable.

As I studied online, to make Snappy write splittable o/p, we have to use it in a Container like format.

Can you please suggest how to go about it? I tried finding some examples online, but could not fine one. I am using Hadoop v0.20.203.

Thanks. Piyush

回答1:

for output

conf.setOutputFormat(SequenceFileOutputFormat.class); SequenceFileOutputFormat.setOutputCompressionType(conf, CompressionType.BLOCK); SequenceFileOutputFormat.setCompressOutput(conf, true); conf.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");

For map output

Configuration conf = new Configuration(); conf.setBoolean("mapred.compress.map.output", true); conf.set("mapred.map.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");



回答2:

In the new API OutputFormat installing for the Job, and not for the configuration. Then, first part will be:

Job job = new Job(conf);
...
SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK);
SequenceFileOutputFormat.setCompressOutput(job, true);

conf.set("mapred.output.compression.codec","org.apache.hadoop.io.compress.SnappyCodec");