Use of initialize in python multiprocessing worker

2019-07-23 18:30发布

I was looking into the multiprocessing.Pool for workers, trying to initialize workers with some state. The pool can take a callable, initialize, but it isn't passed a reference to the initialized worker. The few example that I've seen utilize it call global variables, which seems really nasty.

Is there any good way to initialize worker state using multiprocessing.Pool?

Edit: An example:

I have workers, each of which do a bit relatively expensive initialisation (binding to a socket), which I don't want to have to do every time. I could initialize my sockets by hand, then pass them in when I assign work, but sharing file descriptors across processes is complicated, if not impossible. So I would have to initialize and bind every time I wanted to process a request.

1条回答
贪生不怕死
2楼-- · 2019-07-23 19:16

Technically speaking, the right thing to do would be having the result of the initialization function passed as argument to every function executed by the worker.

It's also true that in this context is fine and safe to have global variables, since by construction they result private objects living in the separate domains of different processes.

My general suggestion is to build functions with a sane reentrant programming style, and to allow global variables while exploiting the multiprocessing functionality.

Keeping your example, the following send function requires some context (in this case, a socket):

def send(socket, data):
    pass # ... your code here
    return dust

The initialization code and the base code executed by the worker will rely on global variables for convenience.

socket = None
def init(address, port):
    global socket
    socket = magic(address, port)

def job(data):
    global socket
    assert socket is not None
    return send(socket, data)

pool = multithreading.Pool(N, init, [address, port])
pool.map(job, ['foo', 'bar', 'baz'])

By coding it in this way it gets simple and natural to test it without multiprocessing. You can think of your global state as a perfectly safe context capsule.

As additional point of convenience, keep in mind that multiprocessing is not very good at sending complex data around (e.g. callbacks). The best approach is sending simple pieces of data (strings, lists, dictionaries, collections.namedtuple ...) and reconstruct the complex data structures on the worker side (using the initialization function).

查看更多
登录 后发表回答