Writing to dictionary of objects in parallel

I have a dictionary of objects, and I would like to populate this dictionary using the multiprocessing package. This bit of code runs "Run" many times in parallel.

Data=dict()
for i in range:
   Data[i]=dataobj(i) #dataobj is a class I have defined elsewhere
   proc=Process(target=Run, args=(i, Data[i]))
   proc.start()

Where "Run" does some simulations, and saves the output in the dataobj object

def Run(i, out):
    [...some code to run simulations....]
    out.extract(file)

My code creates a dictionary of objects, and then modifies the objects in that dictionary in parallel. Is this possible, or do I need to acquire a lock every time I modify an object in the shared dictionary?

标签： python dictionary multiprocessing

1条回答

三岁会撩人

2楼-- · 2019-07-20 18:03

basically, as you're using multiprocessing, then your processes share copies of the original dictionary of objects, and thus populate different ones. What the multiprocessing package handles for you, is messaging of python objects between processes to make things less painful.

A good design for your problem is to have the main process handling populating the dictionary, and have its children processes handle the work. Then use a queue to exchange data between the children processes and the master process.

As a general design idea, here's something that could be done:

from queue import Queue

queues = [Queue(), Queue()]

def simulate(qin, qout):
    while not qin.empty():
        data = qin.pop()
        # work with the data
        qout.put(data)
    # when the queue is empty, the process ends

Process(target=simulate, args=(queues[0][0],queues[0][1])).start()
Process(target=simulate, args=(queues[1][0],queues[1][1])).start()

processed_data_list = []

# first send the data to be processed to the children processes
while data.there_is_more_to_process():
    # here you have to adapt to your context how you want to split the load between your processes
    queues[0].push(data.pop_some_data())
    queues[1].push(data.pop_some_data())

# then for each process' queue 
for qin, qout in queues:
    # you populate your output data list (or dict or whatever)
    while not qout.empty:
        processed_data_list.append(qout.pop())
# here again, you have to adapt to your context how you handle the data sent
# back from the children processes.

though, take it as only a design idea, as this code has a few design flaws, which will get naturally solved when working with real data and processing functions.

0人赞添加讨论(0) 举报

Writing to dictionary of objects in parallel

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间