Akka actor Kill/restart behavior

2019-05-15 01:44发布

问题:

I am confused by behavior I am seeing in Akka. Briefly, I have a set of actors performing scientific calculations (star formation simulation). They have some state. When an error occurs such that one or more enter an invalid state, I want to restart the whole set to start over. I also want to do this if a single calc (over the entire set) takes too long (there is no way to predict in advance how long it may run).

So, there is the set of Simulation actors at the bottom of the tree, then a Director above them (that creates them via a Router, and sends them messages via that Router as well). There is one more Director level above that to create Directors on different machines and collect results from them all.

I handle the timeout case by using the Akka Scheduler to create a one-time timeout event, in the local Director, when the simulation is started. When the Director gets this event, if all its Simulation actors have not finished, it does this:

children ! Broadcast(Kill)

where children is the Router that owns/created them - this sends a Kill to all the children (SimulActors).

What I thought would occur is that all the child actors would be restarted. However, their preRestart() hook method is never called. I see the Kill message received, but that's it.

I must be missing something fundamental here. I have read the Akka docs on this topic and I have to say I find them less than clear (especially the page on Supervisors). I would really appreciate either a thorough explanation of the Kill/restart process, or just some other references (Google wasn't very helpful).

回答1:

Note

If the child of a router terminates, the router will not automatically spawn a new child. In the event that all children of a router have terminated the router will terminate itself.

Taken from the akka docs.



回答2:

I would consider using a supervision strategy - akka has behavior built in for killing all actors (all for one strategy) and you can define the specific strategy - eg restart.

I think a more idiomatic way to run this would be to have the actors throw x exception if they're not done after a period of time and then the supervisor handle that via supervision strategy.

You could throw a not done exception from the child and then define the behaviour like so:

override val supervisorStrategy =
    AllForOneStrategy(maxNrOfRetries = 0) {
      case _: NotDoneException      ⇒ Stop
      case _: Exception     ⇒ Restart
    }

It's important to understand that a restart means stopping the old actor and creating a new separate object/Actor

References:

http://doc.akka.io/docs/akka/snapshot/scala/fault-tolerance.html

http://doc.akka.io/docs/akka/snapshot/general/supervision.html



标签: scala akka