What happens when Spark master fails?

Does the driver need constant access to the master node? Or is it only required to get initial resource allocation? What happens if master is not available after Spark context has been created? Does it mean application will fail?

标签： apache-spark apache-spark-standalone

3条回答

Root（大扎）

2楼-- · 2019-02-12 15:41

The first and probably the most serious for the time being consequence of a master failure or a network partition is that your cluster won't be able to accept new applications. This is why Master is considered to be a single point of failure when cluster is used with default configuration.

Master loss will be acknowledged by the running applications but otherwise these should continue to work more or less like nothing happened with two important exceptions:

application won't be able to finish gracefully
if master is down, or network partition affects worker nodes as well, slaves will try to reregisterWithMaster. If this fails multiple times workers will simply give up. At this moment long running applications (like streaming apps) won't be able to continue processing but it still shouldn't result in immediate failure. Instead application will wait for a master to go back on-line (file system recovery) or a contact from a new leader (Zookeeper mode), and if that happens it will continue processing.

0人赞添加讨论(0) 举报

叼着烟拽天下

3楼-- · 2019-02-12 15:45

Below are the steps spark application does, when it starts,

Starts the Spark Driver
Spark Driver, connects to spark master for resource allocation.
Spark Driver, sends the jar attached in spark context to master server.
Spark Driver, keeps polling master server to get the job status.
If there is a shuffling or broadcast in code, data is routed via spark driver. That is why, it is required for spark driver to have sufficient memory.
If there is any operation like take, takeOrdered, or collect, data is accumulater on driver.

So, yes, failing on master will result in executors not able to communicate with it. So, they will stop working. Failing of master will make driver unable to communicate with it for job status. So, your application will fail.

0人赞添加讨论(0) 举报

Melony?

4楼-- · 2019-02-12 15:54

Yes, the driver and master communicate constantly throughout the SparkContext's lifetime. That allows driver to:

Display detailed status of jobs / stages / tasks on its Web Interface and REST API
Listen on job start and end events (you can add your own listeners)
Wait for jobs to end (via synchronous API - e.g. rdd.count() won't terminate until job is completed) and get their result

A disconnect between driver and master will fail the job.

0人赞添加讨论(0) 举报

What happens when Spark master fails?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间