For C++, we can use OpenMP to do parallel programming; however, OpenMP will not work for Python. What should I do if I want to parallel some parts of my python program?
The structure of the code may be considered as:
solve1(A)
solve2(B)
Where solve1
and solve2
are two independent function. How to run this kind of code in parallel instead of in sequence in order to reduce the running time?
Hope someone can help me. Thanks very much in advance.
The code is:
def solve(Q, G, n):
i = 0
tol = 10 ** -4
while i < 1000:
inneropt, partition, x = setinner(Q, G, n)
outeropt = setouter(Q, G, n)
if (outeropt - inneropt) / (1 + abs(outeropt) + abs(inneropt)) < tol:
break
node1 = partition[0]
node2 = partition[1]
G = updateGraph(G, node1, node2)
if i == 999:
print "Maximum iteration reaches"
print inneropt
Where setinner and setouter are two independent functions. That's where I want to parallel...
You can use the multiprocessing module. For this case I might use a processing pool:
This will spawn processes that can do generic work for you. Since we did not pass
processes
, it will spawn one process for each CPU core on your machine. Each CPU core can execute one process simultaneously.If you want to map a list to a single function you would do this:
Don't use threads because the GIL locks any operations on python objects.
CPython uses the Global Interpreter Lock which makes parallel programing a bit more interesting than C++
This topic has several useful examples and descriptions of the challenge:
Python Global Interpreter Lock (GIL) workaround on multi-core systems using taskset on Linux?
In some cases, it's possible to automatically parallelize loops using Numba, though it only works with a small subset of Python:
Unfortunately, it seems that Numba only works with Numpy arrays, but not with other Python objects. In theory, it might also be possible to compile Python to C++ and then automatically parallelize it using the Intel C++ compiler, though I haven't tried this yet.
The solution, as others have said, is to use multiple processes. Which framework is more appropriate, however, depends on many factors. In addition to the ones already mentioned, there is also charm4py and mpi4py (I am the developer of charm4py).
There is a more efficient way to implement the above example than using the worker pool abstraction. The main loop sends the same parameters (including the complete graph
G
) over and over to workers in each of the 1000 iterations. Since at least one worker will reside on a different process, this involves copying and sending the arguments to the other process(es). This could be very costly depending on the size of the objects. Instead, it makes sense to have workers store state and simply send the updated information.For example, in charm4py this can be done like this:
Note that for this example we really only need one worker. The main loop could execute one of the functions, and have the worker execute the other. But my code helps to illustrate a couple of things:
result_a.get()
is blocked waiting on the result, worker A does the computation in the same process.This can be done very elegantly with Ray.
To parallelize your example, you'd need to define your functions with the
@ray.remote
decorator, and then invoke them with.remote
.There are a number of advantages of this over the multiprocessing module.
These function calls can be composed together, e.g.,
Note that Ray is a framework I've been helping develop.