How to nest multiple parfor loops

2019-01-11 08:27发布

问题:

parfor is a convenient way to distribute independent iterations of intensive computations among several "workers". One meaningful restriction is that parfor-loops cannot be nested, and invariably, that is the answer to similar questions like there and there.

Why parallelization across loop boundaries is so desirable

Consider the following piece of code where iterations take a highly variable amount of time on a machine that allows 4 workers. Both loops iterate over 6 values, clearly hard to share among 4.

for row = 1:6
    parfor col = 1:6
        somefun(row, col);
    end
end

It seems like a good idea to choose the inner loop for parfor because individual calls to somefun are more variable than iterations of the outer loop. But what if the run time for each call to somefun is very similar? What if there are trends in run time and we have three nested loops? These questions come up regularly, and people go to extremes.

Pattern needed for combining loops

Ideally, somefun is run for all pairs of row and col, and workers should get busy irrespectively of which iterand is being varied. The solution should look like

parfor p = allpairs(1:6, 1:6)
    somefun(p(1), p(2));
end

Unfortunately, even if I knew which builtin function creates a matrix with all combinations of row and col, MATLAB would complain with an error The range of a parfor statement must be a row vector. Yet, for would not complain and nicely iterate over columns. An easy workaround would be to create that matrix and then index it with parfor:

p = allpairs(1:6, 1:6);
parfor k = 1:size(pairs, 2)
    row = p(k, 1);
    col = p(k, 2);
    somefun(row, col);
end

What is the builtin function in place of allpairs that I am looking for? Is there a convenient idiomatic pattern that someone has come up with?

回答1:

MrAzzman already pointed out how to linearise nested loops. Here is a general solution to linearise n nested loops.

1) Assuming you have a simple nested loop structure like this:

%dummy function for demonstration purposes
f=@(a,b,c)([a,b,c]);

%three loops
X=cell(4,5,6);
for a=1:size(X,1);
    for b=1:size(X,2);
        for c=1:size(X,3);
            X{a,b,c}=f(a,b,c);
        end
    end
end

2) Basic linearisation using a for loop:

%linearized conventional loop
X=cell(4,5,6);
iterations=size(X);
for ix=1:prod(iterations)
    [a,b,c]=ind2sub(iterations,ix);
    X{a,b,c}=f(a,b,c);
end   

3) Linearisation using a parfor loop.

%linearized parfor loop
X=cell(4,5,6);
iterations=size(X);
parfor ix=1:prod(iterations)
    [a,b,c]=ind2sub(iterations,ix);
    X{ix}=f(a,b,c);
end

4) Using the second version with a conventional for loop, the order in which the iterations are executed is altered. If anything relies on this you have to reverse the order of the indices.

%linearized conventional loop
X=cell(4,5,6);
iterations=fliplr(size(X));
for ix=1:prod(iterations)
    [c,b,a]=ind2sub(iterations,ix);
    X{a,b,c}=f(a,b,c);
end

Reversing the order when using a parfor loop is irrelevant. You can not rely on the order of execution at all. If you think it makes a difference, you can not use parfor.



回答2:

You should be able to do this with bsxfun. I believe that bsxfun will parallelise code where possible (see here for more information), in which case you should be able to do the following:

bsxfun(@somefun,(1:6)',1:6);

You would probably want to benchmark this though.

Alternatively, you could do something like the following:

function parfor_allpairs(fun, num_rows, num_cols)

parfor i=1:(num_rows*num_cols)
    fun(mod(i-1,num_rows)+1,floor(i/num_cols)+1);
end

then call with:

parfor_allpairs(@somefun,6,6);


回答3:

Based on the answers from @DanielR and @MrAzzaman, I am posting two functions, iterlin and iterget in place of prod and ind2sub that allow iteration over ranges also if those do not start from one. An example for the pattern becomes

rng = [1, 4; 2, 7; 3, 10];
parfor k = iterlin(rng)
    [plate, row, col] = iterget(rng, k);
    % time-consuming computations here %
end

The script will process the wells in rows 2 to 7 and columns 3 to 10 on plates 1 to 4 without any workers idling while more wells are waiting to be processed. In hope that this helps someone, I deposited iterlin and iterget at the MATLAB File Exchange.