Preconditions
Let's assume Apache Spark is deployed on a hadoop cluster using YARN. Furthermore a spark execution is running. How does spark handle the situations listed below?
Cases & Questions
- One node of the hadoop clusters fails due to a disc error. However replication is high enough and no data was lost.
- What will happen to tasks that where running at that node?
- One node of the hadoop clusters fails due to a disc error. Replication was not high enough and data was lost. Simply spark couldn't find a file anymore which was pre-configured as resource for the work flow.
- How will it handle this situation?
- During execution the primary namenode fails over.
- Did spark automatically use the fail over namenode?
- What happens when the secondary namenode fails as well?
- For some reasons during a work flow the cluster is totally shut down.
- Will spark restart with the cluster automatically?
- Will it resume to the last "save" point during the work flow?
I know, some questions might sound odd. Anyway, I hope you can answer some or all. Thanks in advance. :)
Here are the answers given by the mailing list to the questions (answers where provided by Sandy Ryza of Cloudera):