I have a dictionary of objects, and I would like to populate this dictionary using the multiprocessing package. This bit of code runs "Run" many times in parallel.
Data=dict()
for i in range:
Data[i]=dataobj(i) #dataobj is a class I have defined elsewhere
proc=Process(target=Run, args=(i, Data[i]))
proc.start()
Where "Run" does some simulations, and saves the output in the dataobj object
def Run(i, out):
[...some code to run simulations....]
out.extract(file)
My code creates a dictionary of objects, and then modifies the objects in that dictionary in parallel. Is this possible, or do I need to acquire a lock every time I modify an object in the shared dictionary?
basically, as you're using multiprocessing, then your processes share copies of the original dictionary of objects, and thus populate different ones. What the multiprocessing package handles for you, is messaging of python objects between processes to make things less painful.
A good design for your problem is to have the main process handling populating the dictionary, and have its children processes handle the work. Then use a queue to exchange data between the children processes and the master process.
As a general design idea, here's something that could be done:
from queue import Queue
queues = [Queue(), Queue()]
def simulate(qin, qout):
while not qin.empty():
data = qin.pop()
# work with the data
qout.put(data)
# when the queue is empty, the process ends
Process(target=simulate, args=(queues[0][0],queues[0][1])).start()
Process(target=simulate, args=(queues[1][0],queues[1][1])).start()
processed_data_list = []
# first send the data to be processed to the children processes
while data.there_is_more_to_process():
# here you have to adapt to your context how you want to split the load between your processes
queues[0].push(data.pop_some_data())
queues[1].push(data.pop_some_data())
# then for each process' queue
for qin, qout in queues:
# you populate your output data list (or dict or whatever)
while not qout.empty:
processed_data_list.append(qout.pop())
# here again, you have to adapt to your context how you handle the data sent
# back from the children processes.
though, take it as only a design idea, as this code has a few design flaws, which will get naturally solved when working with real data and processing functions.