Hadoop: How can I prevent failed tasks from making

2019-06-10 22:49发布

I'm running a hadoop job with, say, 1000 tasks. I need the job to attempt to run every task but many of the tasks will not complete and will instead throw an exception. I cannot change this behavior, but I still need the data obtained from the tasks that did not fail.

How can I make sure Hadoop goes through with all the 1000 tasks despite encountering a large number of failed tasks?

1条回答
爷、活的狠高调
2楼-- · 2019-06-10 23:33

In your case, you could set the maximum percentage of tasks that are allowed to fail without triggering job failure. Map tasks and reduce tasks are controlled independently, using the

mapred.max.map.failures.percent 
mapred.max.reduce.failures.percent 

properties. So if you want 70% of tasks result even if 30% fails you could do so with above properties.

查看更多
登录 后发表回答