In Spark's client mode, the driver needs netwo

2019-04-10 06:51发布

When using spark at client mode (e.g. yarn-client), does the local machine that runs the driver communicates directly with the cluster worker nodes that run the remote executors?

If yes, does it mean the machine (that runs the driver) need to have network access to the worker nodes? So the master node requests resources from the cluster, and returns the IP addresses/ports of the worker nodes to the driver, so the driver can initiating the communication with the worker nodes?

If not, how does the client mode actually work?

If yes, does it mean that the client mode won't work if the cluster is configured in a way that the work nodes are not visible outside the cluster, and one will have to use cluster mode?

Thanks!

标签： apache-spark yarn

2条回答

三岁会撩人

2楼-- · 2019-04-10 06:54

The Driver connects to the Spark Master, requests a context, and then the Spark Master passes the Spark Workers the details of the Driver to communicate and get instructions on what to do.

The means that the driver node must be available on the network to the workers, and it's IP must be one that's visible to them (i.e. if the driver is behind NAT, while the workers are in a different network, it won't work and you'll see errors on the workers that they fail to connect to the driver)

0人赞添加讨论(0) 举报

孤傲高冷的网名

3楼-- · 2019-04-10 06:57

When you run Spark in client mode, the driver process runs locally. In cluster mode, it runs remotely on an ApplicationMaster.

In other words you will need all the nodes to see each other. Spark driver definitely needs to communicate with all the worker nodes. If this is a problem try to use the yarn-cluster mode, then the driver will run inside your cluster on one of the nodes.

0人赞添加讨论(0) 举报

In Spark's client mode, the driver needs netwo

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间