Possible Duplicate:
Why can't hadoop split up a large text file and then compress the splits using gzip?
I found that when using input file that is gzipped the Hadoop chooses to allocate only one map task to handle my map/reduce job.
The gzipped file is more than 1.4 GB, so I would expect many mappers to run in parallel (exacly like when using un-zipped file)
Is there any configuration I can do to improve it?
Gzip files can't be split, so all the data is being processed by only one map. Some other compression algorithm in which compressed files can be split has to be used, then the data will be processed by multiple maps. Here is a nice article on it. (1)
Edit: Here is another article on Snappy (2) which is from Google.
(1) http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
(2) http://blog.cloudera.com/blog/2011/09/snappy-and-hadoop/