I want to find optimal parameters i, j, k
in 0..99 for a given computational problem, and I need to run:
for i in range(100):
for j in range(100):
for k in range(100):
dothejob(i, j, k) # 1 second per computation
This takes a total of 10^6 seconds, i.e. 11.5 days.
I started doing it by splitting the work among 4 processes (to use 100% computing power of my 4-core CPU computer):
for i in range(100):
if i % 4 != 0: # replace != 0 by 1, 2, or 3 for the parallel scripts #2, #3, #4
continue
for j in range(100):
for k in range(100):
dothejob(i, j, k)
with open('done.log', 'a+') as f: # log what has been done
f.write("%i %i\n" % (i, j))
But I have problems with this approach:
I have to run
python script.py
, then openscript.py
, replace line 2 byif i % 4 != 1
, then runpython script.py
, then openscript.py
, replace line 2 byif i % 4 != 2
, then runpython script.py
, then openscript.py
, replace line 2 byif i % 4 != 3
, then runpython script.py
.Let's say the loop is interrupted (need to reboot computer, or crash or anything else, etc.). At least we know all the (i, j) already done in
done.log
(so we don't need to start from 0 again), but there's no easy way to resume the work. (OK we can opendone.log
, parse it, discard the (i, j) already done when restarting the loops, I started doing this - but I had the feeling to reinvent, in a dirty way, something already existing)
I'm looking for a better solution for this (but for example map/reduce might be an overkill for this little task, and not easy to use in a few lines in Python).
Question: How to make a computation for i in range(100): for j in range(100): for k in range(100): dothejob(i, j, k)
easily splittable among multiple processes and easily resumable (e.g. after reboot) in Python?