I have a set of actors that are somewhat stateless and perform similar tasks.
Each of these workers is unreliable and potentially low performing. In my design- I can easily spawn more actors to replace lazy ones.
The performance of an actor is assessed by itself. Is there a way to make the supervisor/actor pool do this assessment, to help decide which workers are slow enough for me to replace? Or is my current strategy "the" right strategy?
I'm new to akka myself, so only trying to help, but my attack would be something along the following lines:
Write your own routing logic, something along the following lines https://github.com/akka/akka/blob/v2.3.5/akka-actor/src/main/scala/akka/routing/SmallestMailbox.scala Keep in mind that a new instance is created for every pool, so each instance can store information about how many messages have been processed by each actor so far. In this instance, once you find an actor underperforming, mark it as 'removable' (once it is no longer processing any new messages) in a separate data structure and stop sending further messages.
Write your own router pool: override createRouterActor https://github.com/akka/akka/blob/v2.3.5/akka-actor/src/main/scala/akka/routing/RouterConfig.scala:236 to provide your own CustomRouterPoolActor
Write your CustomRouterPoolActor along the following lines: https://github.com/akka/akka/blob/8485cd2ebb46d2fba851c41c03e34436e498c005/akka-actor/src/main/scala/akka/routing/Resizer.scala (See ResizablePoolActor). This actor will have access to your strategy instance. From this strategy instance- remove the routees already marked for removal. Look at ResizablePoolCell to see how to remove actors.
Question is - why some of your workers perform badly? Is there anything difference between them (I assume not). If not, that maybe some payloads simply require more work the the others - what's the point of terminating them then?
Once we had similar problem - and used SmallestMailboxRoutingLogic. It basically try to distribute the workload based on mailbox sizes.
Anyway, I would rather try to answer the question - why some of the workers are unstable and perform poorly - because this looks like a biggest problem you are just trying to cover elsewhere.