Writing to dictionary of objects in parallel

2019-07-20 17:25发布

I have a dictionary of objects, and I would like to populate this dictionary using the multiprocessing package. This bit of code runs "Run" many times in parallel.

Data=dict()
for i in range:
   Data[i]=dataobj(i) #dataobj is a class I have defined elsewhere
   proc=Process(target=Run, args=(i, Data[i]))
   proc.start()

Where "Run" does some simulations, and saves the output in the dataobj object

def Run(i, out):
    [...some code to run simulations....]
    out.extract(file)

My code creates a dictionary of objects, and then modifies the objects in that dictionary in parallel. Is this possible, or do I need to acquire a lock every time I modify an object in the shared dictionary?

1条回答
三岁会撩人
2楼-- · 2019-07-20 18:03

basically, as you're using multiprocessing, then your processes share copies of the original dictionary of objects, and thus populate different ones. What the multiprocessing package handles for you, is messaging of python objects between processes to make things less painful.

A good design for your problem is to have the main process handling populating the dictionary, and have its children processes handle the work. Then use a queue to exchange data between the children processes and the master process.

As a general design idea, here's something that could be done:

from queue import Queue

queues = [Queue(), Queue()]

def simulate(qin, qout):
    while not qin.empty():
        data = qin.pop()
        # work with the data
        qout.put(data)
    # when the queue is empty, the process ends

Process(target=simulate, args=(queues[0][0],queues[0][1])).start()
Process(target=simulate, args=(queues[1][0],queues[1][1])).start()

processed_data_list = []

# first send the data to be processed to the children processes
while data.there_is_more_to_process():
    # here you have to adapt to your context how you want to split the load between your processes
    queues[0].push(data.pop_some_data())
    queues[1].push(data.pop_some_data())

# then for each process' queue 
for qin, qout in queues:
    # you populate your output data list (or dict or whatever)
    while not qout.empty:
        processed_data_list.append(qout.pop())
# here again, you have to adapt to your context how you handle the data sent
# back from the children processes.

though, take it as only a design idea, as this code has a few design flaws, which will get naturally solved when working with real data and processing functions.

查看更多
登录 后发表回答