I'm running a hadoop job with, say, 1000 tasks. I need the job to attempt to run every task but many of the tasks will not complete and will instead throw an exception. I cannot change this behavior, but I still need the data obtained from the tasks that did not fail.
How can I make sure Hadoop goes through with all the 1000 tasks despite encountering a large number of failed tasks?
In your case, you could set the maximum percentage of tasks that are allowed to fail without triggering job failure. Map tasks and reduce tasks are controlled independently, using the
properties. So if you want 70% of tasks result even if 30% fails you could do so with above properties.