Gridgain failover of master (sender) node

2019-07-29 05:15发布

站内文章 / 后端开发

57 0

放荡不羁爱自由

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I am working on batch processing problem. Solution needs to handle failing hardware.

There is master node (which initiates tasks executions) and worker nodes which execute the jobs. I know how failover of worker nodes works but I could not find any information about failover of master nodes. Whenever master node which started a task fails the whole task is canceled.

Is there any way to finish task processing then?

Could you suggest the best way of implementing failover of master node?

Kind Regards, Kuba

回答1:

Whenever your master node dies, basically there is noone to perform the "reduce" step of your MapReduce task.

There are several ways you can try mitigating this problem:

Save intermediate checkpoints using GridCheckpointSpi (GridTaskSession.saveCheckpoint(..) API) and then when your task restarts after node crash, you can check if there is a checkpoint saved and start from it.
Do the same as in (1), but use the data grid instead (GridCache API).
If you don't care about "reduce", have your jobs ignore the "cancel" call and just have them save the results in data grid when they are done.

--Best