I have the existing oeprating Spark cluster that was launched with spark-ec2
script. I'm trying to add new slave by following the instructions:
- Stop the cluster
- On AWS console "launch more like this" on one of the slaves
- Start the cluster
Although the new instance is added to the same security group and I can successfully SSH to it with the same private key, spark-ec2 ... start
call can't access this machine for some reason:
Running setup-slave on all cluster nodes to mount filesystems, etc...
[1] 00:59:59 [FAILURE] xxx.compute.amazonaws.com
Exited with error code 255 Stderr: Permission denied (publickey).
, obviously, followed by tons of other errors while trying to deploy Spark stuff on this instance.
The reason is that Spark Master machine doesn't have an rsync
access for this new slave, but the 22 port is open...
The issue was that SSH key generated on Spark Master was not transferred to this new slave. Spark-ec2 script with
start
command omits this step. The solution is to uselaunch
command with--resume
options. Then the SSH key is transferred to the new slave and everything goes smooth.Yet another solution is to add the master's public key (~/.ssh/id_rsa.pub) to the newly added slaves ~/.ssh/authorized_keys. (Got this advice on Spark mailing list)