Is it always the case that Driver must be on a Mas

2019-08-10 03:29发布

Is it always the case that the Driver (as a program that runs the master node) must be on a master node ?

For example, if I setup the ec2 with one master and two workers, does my code that has the main must be executed from the master EC2 instance ?

If answer is NO, what would be the best way to set-up the system where the driver is outside the ec2's master node (lets say, Driver is ran from my computer, while Master and Workers are on EC2)? Do I always have to use the spark-submit, or can I do it from an IDE such as Eclipse or IntelliJ IDEA?

If answer is YES, what would be the best reference to learn more about it (since I need to provide some sort of a proof)?

Thank you kindly for your answer, references would be highly appreciated!

标签： amazon-web-services amazon-ec2 apache-spark master-slave

1条回答

我只想做你的唯一

2楼-- · 2019-08-10 03:48

No, it doesn't have to be on the master.

Using spark-submit you can use deploy-mode to control how your driver is run (either as a client, on the machine you run submit on (which could be master or another), or as cluster, on the workers).

There is network communication between the workers and the driver so you want it 'close' to the workers, never across the WAN.

You can run from inside a repl (spark-shell) which could be accessed from your IDE. If you're using a dynamic language like Clojure, you can also just create a SparkContext referencing (through master) a local cluster, or the cluster you want to put jobs to, and then code through the repl. In practice it isn't this easy.

0人赞添加讨论(0) 举报

Is it always the case that Driver must be on a Mas

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间