Why do processes spawned by the multiprocessing mo

2019-08-05 11:29发布

问题:

My impression with python multiprocessing is that when you create a new process with multiprocessing.Process(), it creates an entire copy of your current program in memory and continues working from there. With that in mind, I'm confused by the behaviour of the following script.

WARNING: This script will allocate a large amount of memory! Run it with caution!

import multiprocessing
import numpy as np
from time import sleep

#Declare a dictionary globally
bigDict = {}

def sharedMemory():
    #Using numpy, store 1GB of random data
    for i in xrange(1000):
        bigDict[i] = np.random.random((125000))
    bigDict[0] = "Known information"

    #In System Monitor, 1GB of memory is being used
    sleep(5)

    #Start 4 processes - each should get a copy of the 1GB dict
    for _ in xrange(4):
        p = multiprocessing.Process(target=workerProcess)
        p.start()

    print "Done"

def workerProcess():
    #Sleep - only 1GB of memory is being used, not the expected 4GB
    sleep(5)

    #Each process has access to the dictionary, even though the memory is shared
    print multiprocessing.current_process().pid,bigDict[0]

if __name__ == "__main__":
    sharedMemory()

The above program illustrates my confusion - it seems like the dict automatically becomes shared between the processes. I thought to get that behaviour I had to use a multiprocessing manager. Could someone explain what is going on?

回答1:

On Linux, forking a process doesn't result in twice the memory being occupied immediately. Instead, the page table of the new process will be set up to point to the same physical memory as the old process, and only if one of the processes attempts to do a write to one of the pages, they get actually copied (copy on write, COW). The result is that it appears that both processes have separate memory, but physical memory is only allocated once one of the process actually touches the memory.