Is it always the case that Driver must be on a Mas

2019-08-10 03:29发布

Is it always the case that the Driver (as a program that runs the master node) must be on a master node ?

For example, if I setup the ec2 with one master and two workers, does my code that has the main must be executed from the master EC2 instance ?

If answer is NO, what would be the best way to set-up the system where the driver is outside the ec2's master node (lets say, Driver is ran from my computer, while Master and Workers are on EC2)? Do I always have to use the spark-submit, or can I do it from an IDE such as Eclipse or IntelliJ IDEA?

If answer is YES, what would be the best reference to learn more about it (since I need to provide some sort of a proof)?

Thank you kindly for your answer, references would be highly appreciated!

1条回答
我只想做你的唯一
2楼-- · 2019-08-10 03:48

No, it doesn't have to be on the master.

Using spark-submit you can use deploy-mode to control how your driver is run (either as a client, on the machine you run submit on (which could be master or another), or as cluster, on the workers).

There is network communication between the workers and the driver so you want it 'close' to the workers, never across the WAN.

You can run from inside a repl (spark-shell) which could be accessed from your IDE. If you're using a dynamic language like Clojure, you can also just create a SparkContext referencing (through master) a local cluster, or the cluster you want to put jobs to, and then code through the repl. In practice it isn't this easy.

查看更多
登录 后发表回答