Ability to limit maximum reducers for a hadoop hiv

2019-08-03 06:59发布

问题:

I've tried prepending my query with:

set mapred.running.reduce.limit = 25;

And

 set hive.exec.reducers.max = 35;

The last one jailed a job with 530 reducers down to 35... which makes me think it was going to try and shoe horn 530 reducers worth of work into 35.

Now giving

set mapred.tasktracker.reduce.tasks.maximum = 3;

a try to see if that number is some sort of max per node ( previously was 7 on a cluster with 70 potential reducer's ).

Update:

 set mapred.tasktracker.reduce.tasks.maximum = 3;

Had no effect, was worth a try though.

回答1:

Not exactly a solution to the question, but potentially a good compromise.

set hive.exec.reducers.max = 45;

For a super query of doom that has 400+ reducers, this jails the most expensive hive task down to 35 reducers total. My cluster currently only has 10 nodes, each node supporting 7 reducers...so in reality only 70 reducers can run as one time. By jailing the job down to less then 70, I've noticed a slight improvement in speed without any visible changes to the final product. Testing this in production to figure out what exactly is going on here. In the interim it's a good compromise solution.