spark.task.maxFailures not working as expected

2020-08-04 10:16发布

I am running a Spark job with spark.task.maxFailures set to 1, and according to the official documentation:

spark.task.maxFailures

Number of individual task failures before giving up on the job. Should be greater than or equal to 1. Number of allowed retries = this value - 1.

So my job should fail as soon as a task fails... However, it is trying a second time before giving up. Am I missing something? I have checked the property value in runtime just in case, and it is correctly set to 1. In my case, it fails in the last step, so the first attempt creates the output directory and the second one always fails because the output directory already exists, which is not really helpful.

Is there some kind of bug in this property or is the documentation wrong?

标签： apache-spark

1条回答

家丑人穷心不美

2楼-- · 2020-08-04 10:25

That is the number of individual task failures that are allowed, but what you are describing sounds like the actual job failing and being retried.

If you're running this with YARN, the job itself could be being resubmitted multiple times, see yarn.resourcemanager.am.max-attempts. If so, you could turn this setting down to 1.

0人赞添加讨论(0) 举报

spark.task.maxFailures not working as expected

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间