I have a function which performs some simulation and returns an array in string format.
I want to run the simulation (the function) for varying input parameter values, over 10000 possible input values, and write the results to a single file.
I am using multiprocessing, specifically, pool.map function to run the simulations in parallel.
Since the whole process of running the simulation function over 10000 times takes a very long time, I really would like to track the process of the entire operation.
I think the problem in my current code below is that, pool.map runs the function 10000 times, without any process tracking during those operations. Once the parallel processing finishes running 10000 simulations (could be hours to days.), then I keep tracking when 10000 simulation results are being saved to a file..So this is not really tracking the processing of pool.map operation.
Is there an easy fix to my code that will allow process tracking?
def simFunction(input):
# Does some simulation and outputs simResult
return str(simResult)
# Parallel processing
inputs = np.arange(0,10000,1)
if __name__ == "__main__":
numCores = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes = numCores)
t = pool.map(simFunction, inputs)
with open('results.txt','w') as out:
print("Starting to simulate " + str(len(inputs)) + " input values...")
counter = 0
for i in t:
out.write(i + '\n')
counter = counter + 1
if counter%100==0:
print(str(counter) + " of " + str(len(inputs)) + " input values simulated")
print('Finished!!!!')
I think what you need is a log file.
I would recommend you to use the logging module which is part of the Python standard library. But unfortunately logging is not multiprocessing-safe. So you can't use it out-of-the-box in your app.
So, you will need to use a multiprocessing-safe log handler or implement yours using a Queue or locks along with the logging module.
There's a lot of discussion about this in Stackoverflow. This for instance: How should I log while using multiprocessing in Python?
If most of the CPU load is in the simulation function and you are not going to use log rotation, you can probably use a simple lock mechanism like this:
You can "tail -f" your log file when running. This is what you should see:
Tried on Windows and Linux.
Hope this helps
If you use an iterated
map
function, it's pretty easy to keep track of progress.Or, you can use an asynchronous
map
. Here I'll do things a little differently, just to mix it up.Note that I'm using
pathos.multiprocessing
instead ofmultiprocessing
. It's just a fork ofmultiprocessing
that enables you to domap
functions with multiple inputs, has much better serialization, and allows you to executemap
calls anywhere (not just in__main__
). You could usemultiprocessing
to do the above as well, however the code would be very slightly different.Either an iterated or asynchronous
map
will enable you to write whatever code you want to do better process tracking. For example, pass a unique "id" to each job, and watch which come back, or have each job return it's process id. There are lots of ways to track progress and processes… but the above should give you a start.You can get
pathos
here: https://github.com/uqfoundationThere is no "easy fix".
map
is all about hiding implementation details from you. And in this case you want details. That is, things become a little more complex, by definition. You need to change the communication paradigm. There are many ways to do so.One is: create a Queue for collecting your results, and let your workers put results into this queue. You can then, from within a monitoring thread or process, look at the queue, and consume the results as they are coming in. While consuming, you can analyze them, and generate log output. This might be the most general way to keep track of progress: you can respond to incoming results in any way, in real time.
A more simple way might be to slightly modify your worker function, and generate log output in there. By carefully analyzing the log output with external tools (such as
grep
andwc
), you can come up with very simple means of keeping track.