If I need to share a multiprocessing.Queue
or a multiprocessing.Manager
(or any of the other synchronization primitives), is there any difference in doing it by defining them at the global (module) level, versus passing them as an argument to the function executed in a different process?
For example, here are three possible ways I can imagine a queue could be shared:
# works fine on both Windows and Linux
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
def main():
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print(q.get()) # prints "[42, None, 'hello']"
p.join()
if __name__ == '__main__':
main()
vs.
# works fine on Linux, hangs on Windows
from multiprocessing import Process, Queue
q = Queue()
def f():
q.put([42, None, 'hello'])
def main():
p = Process(target=f)
p.start()
print(q.get()) # prints "[42, None, 'hello']"
p.join()
if __name__ == '__main__':
main()
vs.
# works fine on Linux, NameError on Windows
from multiprocessing import Process, Queue
def f():
q.put([42, None, 'hello'])
def main():
p = Process(target=f)
p.start()
print(q.get()) # prints "[42, None, 'hello']"
p.join()
if __name__ == '__main__':
q = Queue()
main()
Which the correct approach? I'm guessing from my experimentation that it's only the first one, but wanted to confirm it's officially the case (and not only for Queue
but for Manager
and other similar objects).
As mentioned in the programming guidelines
Explicitly pass resources to child processes
On Unix using the fork start method, a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process.
Apart from making the code (potentially) compatible with Windows and the other start methods this also ensures that as long as the child process is still alive the object will not be garbage collected in the parent process. This might be important if some resource is freed when the object is garbage collected in the parent process.
The issue is the way the spawn/forkserver (Windows only supports spawn) works under the hood. Instead of cloning the parent process with its memory and files desciptors, it creates a new process from the ground. It then loads a new Python interpreter passing the modules to import and launches it. This obviously means your global variable will be a brand new Queue instead of the parent's one.
Another implication is that the objects you want to pass to the new process must be pickleable as they will be passed through a pipe.
Just summarizing the answer from Davin Potts:
The only portable solution is to share Queue()
and Manager().*
objects by passing them as arguments - never as global variables. The reason is that on Windows all the global variables will be re-created (rather than copied) by literally running module the code from the beginning (very little information is actually copied from the parent process to the child process); so a brand new Queue()
would be created and of course (without some undesirable and confusing magic) it can't possibly be connected to the Queue()
in the parent process.
My understanding is that there is no disadvantage to passing Queue()
, etc. as parameters; I can't find any reason why anyone would want to use a non-portable solution with global variables, but of course I may be wrong.