How to use classes with Python multiprocessing?

2019-02-27 18:26发布

问题:

Here's some sample code that is reads a file and adds up each line. It is supposed to add up all the numbers from 0-20. However, I always get a result of 0.

I can see that intermediate calculations are succeeding, so why is the final result 0?

Is there a better way to do this? I am trying to do more calcuations on a larger, more complex input file, and store some statistics as I go.

import multiprocessing
import StringIO

class Total():
    def __init__(self):
        self.total = 0

    def add(self, number):
        self.total += int(number)

    def __str__(self):
        return str(self.total)

total = Total()

def f(input):
    total.add(input)

# Create mock file
mock_file = StringIO.StringIO()
for i in range(20):
    mock_file.write("{}\n".format(i))
mock_file.seek(0)

# Compute
pool = multiprocessing.Pool(processes=4)
pool.map(f, mock_file)

print total

# Cleanup
mock_file.close()

回答1:

You can accomplish this using shared memory with subprocess.Value, just change your Total class to the following:

class Total():
    def __init__(self):
        self.total = multiprocessing.Value('d', 0)

    def add(self, number):
        self.total.value += int(number)

    def __str__(self):
        return str(self.total.value)


回答2:

Each subprocess calling f updates its own copy of total and therefore main process's total is not affected.

You can have each subprocess return the result of its computation (in your mock example, that's just the input, unchanged), and then accumulate it in the main process. E.g.:

def f(input):
  return input

results = pool.map(f, mock_file)
for res in results:
  total.add(res)