I would like to parallelize a for loop in Octave on a single machine (as opposed to a cluster). I asked a question about a parallel version of Octave a while ago parallel computing in octave
And the answer suggested that I download a parallel computing package, which I did. The package seems largely geared to cluster computing, but it did mention single machine parallel computing, but was not clear on how to run even a parallel loop.
I also found another question on SO about this, but I did not find a good answer for parallelizing loops in Octave: Running portions of a loop in parallel with Octave?
Does anyone know where I can find an example of running a for loop in parallel in Octave???
Now
pararrayfun
usage examples can be found there: http://wiki.octave.org/Parallel_packageOctave loops are slow, slow, slow and you're far better off expressing things in terms of array-wise operations. Let's take the example of evaluating a simple trig function over a 2d domain, as in this 3d octave graphics example (but with a more realistic number of points for computation, as opposed to plotting):
vectorized.m:
Converting it to for loops gives us forloops.m:
Note that already the vectorized version "wins" in being simpler and clearer to read, but there's another important advantage, too; the timings are dramatically different:
So if you were using for loops, and you had perfect parallelism with no overhead, you'd have to break this up onto 119 processors just to break even with the non-for-loop !
Don't get me wrong, parallelism is great, but first get things working efficiently in serial.
Almost all of octave's built-in functions are already vectorized in the sense that they operate equally well on scalars or entire arrays; so it's often easy to convert things to array operations instead of doing things element-by-element. For those times when it's not so easy, you'll generally see that there are utility functions (like meshgrid, which generates a 2d-grid from the cartesian product of 2 vectors) that already exist to help you.
I am computing large number of RGB histograms. I need to use explicit loops to do it. Therefore computation of each histogram takes noticeable time. For this reason running the computations in parallel makes sense. In Octave there is an (experimental) function parcellfun written by Jaroslav Hajek that can be used to do it.
My original loop
To use parcellfun, I need to refactor the body of my loop into a separate function.
then I can call it like this
I did a small benchmark on my computer. It is 4 physical cores with Intel HyperThreading enabled.
My original code
With parcellfun
(The results from the parallel and serial version were the same (only transposed).
When I repeated this several times, the running times were pretty much the same all the time. The parallel version was running around 30 second (+- approx 2s) with both 4, 8 and also 16 subprocesses)