Share object state across processes?

2019-05-30 09:37发布

问题:

In the code below, how do I make the Starter object be able to read gen.vals? It seems like a different object gets created, whose state gets updated, but Starter never knows about it. Also, how would the solution apply for self.vals being a dictionary, or any other kind of object?

import multiprocessing
import time

class Generator(multiprocessing.Process):
    def __init__(self):
        self.vals = []
        super(Generator, self).__init__()

    def run(self):
        i = 0
        while True:
            time.sleep(1)
            self.vals.append(i)
            print 'In Generator ', self.vals # prints growing list
            i += 1

class Starter():
    def do_stuff(self):
        gen = Generator()
        gen.start()
        while True:
            print 'In Starter ', gen.vals # prints empty list
            time.sleep(1)

if __name__ == '__main__':
    starter = Starter()
    starter.do_stuff()

Output:

In Starter  []
In Starter  []
In Generator  [0]
In Starter  []
In Generator  [0, 1]
In Starter  []
In Generator  [0, 1, 2]
In Starter  []
In Generator  [0, 1, 2, 3]
In Starter  []
In Generator  [0, 1, 2, 3, 4]
In Starter  []
In Generator  [0, 1, 2, 3, 4, 5]
In Starter  []
In Generator  [0, 1, 2, 3, 4, 5, 6]
In Starter  []
In Generator  [0, 1, 2, 3, 4, 5, 6, 7]

回答1:

When you start a process it essentially executes in a whole separate context (here's a brief explanation on what's going on) so there is no shared memory to speak of, therefore whatever your run() method does doesn't really reflect in your main process - Python spawns/forks a whole new process out of it, instantiates your Generator there and calls its run() method and any changes to the state of that other instance in a different process stay there.

If you want to pass data around, you need to use some multiprocessing aware structures that will essentially serialize/deserialize data between different processes and communicate the changes back and forward. For example:

import multiprocessing
import time

class Generator(multiprocessing.Process):
    def __init__(self):
        self._vals = []  # keeps the internal state
        self.vals = multiprocessing.Queue()  # a queue for the exchange
        super(Generator, self).__init__()

    def run(self):
        i = 0
        while True:
            time.sleep(1)
            self._vals.append(i)  # update the internal state
            print('In Generator ', self._vals) # prints growing list
            self.vals.put(self._vals)  # add it to the queue
            i += 1

class Starter():
    def do_stuff(self):
        gen = Generator()
        gen.start()
        while True:
            print('In Starter ', gen.vals.get()) # print what's in the queue
            time.sleep(1)

if __name__ == '__main__':
    starter = Starter()
    starter.do_stuff()

Will print out:

In Generator  [0]
In Starter  [0]
In Generator  [0, 1]
In Starter  [0, 1]
In Generator  [0, 1, 2]
In Starter  [0, 1, 2]
In Generator  [0, 1, 2, 3]
In Starter  [0, 1, 2, 3]
etc.

If you want to do more complex/semi-concurrent data modifications or deal with more structured data, check the structures supported by multiprocessing.Manager. Of course, for very complex stuff I'd always recommend using an in-memory database like Redis as a means of inter-process data exchange. Or, if you prefer to do micro-management yourself, ØMQ is always a good option.