Fail to Increase Hive Mapper Tasks?

2019-02-20 07:45发布

I have a managed Hive table, which contains only one 150MB file. I then do "select count(*) from tbl" to it, and it uses 2 mappers. I want to set it to a bigger number.

First I tried 'set mapred.max.split.size=8388608;', so hopefully it will use 19 mappers. But it only uses 3. Somehow it still split the input by 64MB. I also used 'set dfs.block.size=8388608;', not working either.

Then I tried a vanilla map-reduce job to do the same thing. It initially uses 3 mappers, and when I set mapred.max.split.size, it uses 19. So the problem lies in Hive, I suppose.

I read some of the Hive source code, like CombineHiveInputFormat, ExecDriver, etc. can't find a clue.

What else settings can I use?

标签： hadoop hive

2条回答

女痞

2楼-- · 2019-02-20 07:51

I combined @javadba 's answer with that I received from Hive mailing list, here's the solution:

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set mapred.map.tasks = 20;
select count(*) from dw_stage.st_dw_marketing_touch_pi_metrics_basic;

From the mailing list:

It seems that HIVE is using the old Hadoop MapReduce API and so mapred.max.split.size won't work.

I would dig into source code later.

0人赞添加讨论(0) 举报

Melony?

3楼-- · 2019-02-20 07:52

Try adding the following:

set hive.merge.mapfiles=false;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;

0人赞添加讨论(0) 举报

Fail to Increase Hive Mapper Tasks?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间