Python - Mutliprocess, member functions of classes

2019-08-04 16:01发布

问题:

I can't figure out if this is because of me, or the multiprocessing module that Python2.7 has. Can anyone figure out why this is not working?

from multiprocessing import pool as mp
class encapsulation:
   def __init__(self):
       self.member_dict = {}
   def update_dict(self,index,value):
       self.member_dict[index] = value
encaps = encapsulation()
def method(argument):
   encaps.update_dict(argument,argument)
   print encaps.member_dict
p = mp() #sets up multiprocess pool of processors
p.map(method,sys.argv[1:]) #method is the function, sys.argv is the list of arguments to multiprocess
print encaps.member_dict
>>>{argument:argument}
>>>{}

So my question is just about member variables. It is my understanding that the class encapsulation should hold this dictionary inside and outside of the function. Why does it reset and give me an empty dictionary even though I have only initialized it once? Please help

回答1:

Even though you are encapsulating the object, the multiprocessing module will end up using a local copy of the object in each process and never actually propagate your changes back to you. In this case, you are not using the Pool.map properly, as it expects each method call to return a result, which is then sent back up to your return value. If what you want is to affect the shared object, then you need a manager, which will coordinate the shared memory:

Encapsulating a shared object

from multiprocessing import Pool 
from multiprocessing import Manager
import sys

class encapsulation:
   def __init__(self):
       self.member_dict = {}
   def update_dict(self,index,value):
       self.member_dict[index] = value

encaps = encapsulation()

def method(argument):
   encaps.update_dict(argument,argument)
   # print encaps.member_dict       

manager = Manager()
encaps.member_dict = manager.dict()

p = Pool()
p.map(method,sys.argv[1:])

print encaps.member_dict

output

$ python mp.py a b c
{'a': 'a', 'c': 'c', 'b': 'b'}

I would suggest not really setting the shared object as the member attribute, but rather passing in as an arg, or encapsulating the shared object itself, and then passing its values into your dict. The shared object cannot be kept persistently. It needs to be emptied and discarded:

# copy the values to a reg dict
encaps.member_dict = encaps.member_dict.copy()

But this might even be better:

class encapsulation:
   def __init__(self):
       self.member_dict = {}
   # normal dict update
   def update_dict(self,d):
       self.member_dict.update(d)

encaps = encapsulation()

manager = Manager()
results_dict = manager.dict()

# pass in the shared object only
def method(argument):
   results_dict[argument] = argument    

p = Pool()
p.map(method,sys.argv[1:])

encaps.update_dict(results_dict)

Using the pool.map as intended

If you were using the map to return values, it might look like this:

def method(argument):
   encaps.update_dict(argument,argument)
   return encaps.member_dict

p = Pool()
results = p.map(method,sys.argv[1:]) 
print results
# [{'a': 'a'}, {'b': 'b'}, {'c': 'c'}]

You would need to combine the results into your dict again:

for result in results:
    encaps.member_dict.update(result)
print encaps.member_dict
# {'a': 'a', 'c': 'c', 'b': 'b'}