I have a complicated for
loop which contains multiple operations for multiple records in a loop. The loop looks like this:
for i,j,k in zip(is,js,ks):
#declare multiple lists.. like
a = []
b = []
#...
if i:
for items in i:
values = items['key'].split("--")
#append the values to the declared lists
a.append(values[0])
b.append(values[1])
# also other operations with j and k where are is a list of dicts.
if "substring" in k:
for k, v in j["key"].items():
l = "string"
t = v
else:
for k, v in j["key2"].items():
l = k
t = v
# construct an object with all the lists/params
content = {
'sub_content': {
"a":a,
"b":b,
.
.
}
}
#form a tuple. We are interested in this tuple.
data_tuple = (content,t,l)
Considering the above for
loop, how do I parallelize it? I've looked into multiprocessing but I have not been able to parallelize such a complex loop. I am also open to suggestions that might perform better here including parallel language paradigms like OpenMP/MPI/OpenACC.
You can use the Python multiprocessing library. As noted in this excellent answer you should figure out if you need multi-processing or multi-threading.
Bottom Line: If you need multi-threading you should use multiprocessing.dummy. If you are only doing CPU intensive tasks with no IO/dependencies then you can use multiprocessing.
Set up the zip object
Simple example function
Set up the multi-threading.
Your full example