spark.task.maxFailures not working as expected

2020-08-04 10:16发布

I am running a Spark job with spark.task.maxFailures set to 1, and according to the official documentation:

spark.task.maxFailures

Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1.

So my job should fail as soon as a task fails... However, it is trying a second time before giving up. Am I missing something? I have checked the property value in runtime just in case, and it is correctly set to 1. In my case, it fails in the last step, so the first attempt creates the output directory and the second one always fails because the output directory already exists, which is not really helpful.

Is there some kind of bug in this property or is the documentation wrong?

1条回答
家丑人穷心不美
2楼-- · 2020-08-04 10:25

That is the number of individual task failures that are allowed, but what you are describing sounds like the actual job failing and being retried.

If you're running this with YARN, the job itself could be being resubmitted multiple times, see yarn.resourcemanager.am.max-attempts. If so, you could turn this setting down to 1.

查看更多
登录 后发表回答