Python Multiprocessing - apply class method to a l

2020-02-21 08:22发布

问题:

Is there a simple way to use Multiprocessing to do the equivalent of this?

for sim in sim_list:
  sim.run()

where the elements of sim_list are "simulation" objects and run() is a method of the simulation class which does modify the attributes of the objects. E.g.:

class simulation:
    def __init__(self):
        self.state['done']=False
        self.cmd="program"
    def run(self):
        subprocess.call(self.cmd)
        self.state['done']=True

All the sim in sim_list are independent, so the strategy does not have to be thread safe.

I tried the following, which is obviously flawed because the argument is passed by deepcopy and is not modified in-place.

from multiprocessing import Process

for sim in sim_list:
  b = Process(target=simulation.run, args=[sim])
  b.start()
  b.join()

回答1:

One way to do what you want is to have your computing class (simulation in your case) be a subclass of Process. When initialized properly, instances of this class will run in separate processes and you can set off a group of them from a list just like you wanted.

Here's an example, building on what you wrote above:

import multiprocessing
import os
import random

class simulation(multiprocessing.Process):
    def __init__(self, name):
        # must call this before anything else
        multiprocessing.Process.__init__(self)

        # then any other initialization
        self.name = name
        self.number = 0.0
        sys.stdout.write('[%s] created: %f\n' % (self.name, self.number))

    def run(self):
        sys.stdout.write('[%s] running ...  process id: %s\n' 
                         % (self.name, os.getpid()))

        self.number = random.uniform(0.0, 10.0)
        sys.stdout.write('[%s] completed: %f\n' % (self.name, self.number))

Then just make a list of objects and start each one with a loop:

sim_list = []
sim_list.append(simulation('foo'))
sim_list.append(simulation('bar'))

for sim in sim_list:
    sim.start()

When you run this you should see each object run in its own process. Don't forget to call Process.__init__(self) as the very first thing in your class initialization, before anything else.

Obviously I've not included any interprocess communication in this example; you'll have to add that if your situation requires it (it wasn't clear from your question whether you needed it or not).

This approach works well for me, and I'm not aware of any drawbacks. If anyone knows of hidden dangers which I've overlooked, please let me know.

I hope this helps.



回答2:

For those who will be working with large data sets, an iterable would be your solution here:

import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())
pool.imap(sim.start, sim_list)