In experimenting with the ZeroMQ
Push/Pull
(what they call Pipeline
) socket type, I'm having difficulty understanding the utility of this pattern. It's billed as a "load-balancer".
Given a single server sending tasks to a number of workers, Push/Pull will evenly hand out the tasks between all the clients. 3 clients and 30 tasks, each client gets 10 tasks: client1 gets tasks 1, 4, 7,... client2, 2, 5,... and so on. Fair enough. Literally.
However, in practice there is often a non-homogeneous mix of task complexity or client compute resources (or availability), then this pattern breaks badly. All the tasks seem to be scheduled in advance, and the server has no knowledge of the progress of the clients or if they are even available. If client1 goes down, its remaining tasks are not sent to the other clients, but remain queued for client1. If client1 remains down, then those tasks are never handled. Conversely, if a client is faster at processing its tasks, it doesn't get further tasks and remains idle, as they remain scheduled for the other clients.
Using REQ/REP
is one possible solution; tasks are then only given to an available resource .
So am I missing something? How is Push/Pull
to be used effectively?
Is there a way to handle the asymmetry of clients, tasks, etc, with this socket type?
Thanks!
Here's a simple Python example:
# server
import zmq
import time
context = zmq.Context()
socket = context.socket(zmq.PUSH)
#socket = context.socket(zmq.REP) # uncomment for Req/Rep
socket.bind("tcp://127.0.0.1:5555")
i = 0
time.sleep(1) # naive wait for clients to arrive
while True:
#msg = socket.recv() # uncomment for Req/Rep
socket.send(chr(i))
i += 1
if i == 100:
break
time.sleep(10) # naive wait for tasks to drain
.
# client
import zmq
import time
import sys
context = zmq.Context()
socket = context.socket(zmq.PULL)
#socket = context.socket(zmq.REQ) # uncomment for Req/Rep
socket.connect("tcp://127.0.0.1:5555")
delay = float(sys.argv[1])
while True:
#socket.send('') # uncomment for Req/Rep
message = socket.recv()
print "recv:", ord(message)
time.sleep(delay)
Fire up 3 clients with a delay parameter on the command line (ie, 1, 1, and 0.1) and then the server, and see how all the tasks are evenly distributed. Then kill one of the clients to see that its remaining tasks aren't handled.
Uncomment the lines indicated to switch it to a Req/Rep
type socket and watch a more effective load-balancer.