python: problems with accessing variables while us

2019-08-07 19:23发布

问题:

I am new to multiprocessing concepts in python and I have problem accessing variables when I try to include multiprocessing in my code. Sorry if Iam sounding naive, but I just cant figure it out. Below is a simple version of my scenario.

class Data:
    def __init__(self):
        self.data = "data"
    def datameth(self):
        print self.data
        print mainvar

class First:
    def __init__(self):
        self.first = "first"
    def firstmeth(self):
        d = Data()
        d.datameth()
        print self.first

def mymethod():
    f = First()
    f.firstmeth()

if __name__ == '__main__':
    mainvar = "mainvar"
    mymethod()

When I run this, its running fine and gives the output:

data
mainvar
first

But when I try to run mymethod()as a process

from multiprocessing import Process
class Data:
    def __init__(self):
        self.data = "data"
    def datameth(self):
        print self.data
        print mainvar

class First:
    def __init__(self):
        self.first = "first"
    def firstmeth(self):
        d = Data()
        #print mainvar
        d.datameth()
        print self.first


def mymethod():
    f = First()
    f.firstmeth()

if __name__ == '__main__':
    mainvar = "mainvar"
    #mymethod()
    p = Process(target = mymethod)
    p.start()

I get an error like this:

NameError: global name 'mainvar' is not defined

The point is, Iam not able to access mainvar from inside First class or Data class. What am I missing here?

Edit: Actually in my real scenario, it is not just declaring mainvar, it is the return value of a method after some processing.

if __name__ == '__main__':
    ***some other stuff***
    mainvar = ***return value of some method**
    p = Process(target = mymethod)
    p.start()

Edit 2: As @dciriello mentioned in comments, It is working fine in Linux but not in Windows :(

回答1:

This is a limitation of Windows, because it doesn't support fork. When a child process is forked in Linux, it gets a copy-on-write replica of the parent's processes state, so the mainvar you defined inside if __name__ == "__main__": will be there. However, on Windows, the child process' state is created by re-importing the __main__ module of the program. This means that mainvar doesn't exist in the children, because it's only created inside the if __name__ == "__main__" guard. So, if you need to access mainvar inside a child process, your only option is to explicitly pass it to the child as an argument to mymethod in the Process constructor:

mainvar = "whatever"
p = Process(target=mymethod, args=(mainvar,))

This best-practice is mentioned in the multiprocessing docs:

Explicitly pass resources to child processes

On Unix a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process.

Apart from making the code (potentially) compatible with Windows this also ensures that as long as the child process is still alive the object will not be garbage collected in the parent process.

Notice the bold part - though it's not quite spelled out, the reason it helps with Windows compatibility is because it helps avoid the exact issue you're seeing.

This is also covered in the section of the docs that talks specifically about Windows limitations caused by the lack of fork:

Global variables

Bear in mind that if code run in a child process tries to access a global variable, then the value it sees (if any) may not be the same as the value in the parent process at the time that Process.start was called.

However, global variables which are just module level constants cause no problems.

Note the "if any". Because your global variable is declared inside the if __name__ == "__main__": guard, it doesn't even show up in the child.



回答2:

Operating systems don't allow processes to share variables easily. If they would, then each process could steal data from any other process and you never want that (like when you enter your credit card details in a web browser).

So when you use the multiprocessing module, you have to use special facilities to share variables (a.k.a "state") between the individual processes like Value and Array. See the documentation for details.



回答3:

you are using 'mainvar' at wrong place,

Try below:

from multiprocessing import Process

mainvar = "mainvar"
class Data:
    def __init__(self):
        self.data = "data"
    def datameth(self):
        print self.data
        print mainvar

class First:
    def __init__(self):
        self.first = "first"
    def firstmeth(self):
        d = Data()
        #print mainvar
        d.datameth()
        print self.first


def mymethod():
    f = First()
    f.firstmeth()

if __name__ == '__main__':
    #mainvar = "mainvar"
    #mymethod()
    p = Process(target = mymethod)
    p.start()