My impression with python multiprocessing is that when you create a new process with multiprocessing.Process()
, it creates an entire copy of your current program in memory and continues working from there. With that in mind, I'm confused by the behaviour of the following script.
WARNING: This script will allocate a large amount of memory! Run it with caution!
import multiprocessing
import numpy as np
from time import sleep
#Declare a dictionary globally
bigDict = {}
def sharedMemory():
#Using numpy, store 1GB of random data
for i in xrange(1000):
bigDict[i] = np.random.random((125000))
bigDict[0] = "Known information"
#In System Monitor, 1GB of memory is being used
sleep(5)
#Start 4 processes - each should get a copy of the 1GB dict
for _ in xrange(4):
p = multiprocessing.Process(target=workerProcess)
p.start()
print "Done"
def workerProcess():
#Sleep - only 1GB of memory is being used, not the expected 4GB
sleep(5)
#Each process has access to the dictionary, even though the memory is shared
print multiprocessing.current_process().pid,bigDict[0]
if __name__ == "__main__":
sharedMemory()
The above program illustrates my confusion - it seems like the dict automatically becomes shared between the processes. I thought to get that behaviour I had to use a multiprocessing manager. Could someone explain what is going on?