Suppose that we have this code in MATLAB:
parpool('local',2) % Create a parallel pool
W = ones(6,6);
W = distributed(W); % Distribute to the workers
spmd
T = W*2; % Calculation performed on workers, in parallel
% T and W are both codistributed arrays here
end
T % View results in client.
whos % T and W are both distributed arrays here
delete(gcp) % Stop pool
I read in documentation that the difference between normal arrays and distributes array is : When we use distributed arrays, these arrays directly send to workers and there isn't any array on clients. So we don't have any access to these arrays in client? Is this only discrepancy?
What is the difference in structure and output of code if we remove W = distributed(W);
line? What is purpose of using distributed array?
What is difference between distributed
and codistributed
. As i read in documentation we can only use codistributed
in spmd
block. Is that ture?
Distributed arrays are stored on the workers, not the client, and operations on them are carried out in parallel by the workers - that's the point of them.
The difference between distributed and codistributed arrays is only one of perspective. From the point of view of the client, they are distributed arrays; from the point of view of the workers, they are codistributed arrays.
To illustrate, first start a pool:
>> parpool('local',2)
create an array:
>> W = ones(6,6);
W
is stored on the client.
Now create a distributed array from W
:
>> V = distributed(W);
V
is stored on the workers, split across each worker. You still have access to V
from the client, but when you do so it is pulling V
back from the workers.
>> V
V =
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
1 1 1 1 1 1
Note that in the Workspace Browser, V
is there as a 6x6 distributed array, not a 6x6 double like W
.
Now although from the point of view of the client V
is a distributed array, from the point of view of the workers, V
is a codistributed array.
>> spmd; disp(V); end
Lab 1:
LocalPart: [6x3 double]
Codistributor: [1x1 codistributor1d]
Lab 2:
LocalPart: [6x3 double]
Codistributor: [1x1 codistributor1d]
You can see that V
is codistributed, and that only half of it (6x3) is stored on each worker.
When you do something with V
, that happens on the workers, in parallel, and the results are stored on the workers as a distributed/codistributed array:
>> spmd; T = V*2; end
>> spmd; disp(T); end
Lab 1:
LocalPart: [6x3 double]
Codistributor: [1x1 codistributor1d]
Lab 2:
LocalPart: [6x3 double]
Codistributor: [1x1 codistributor1d]
You have access to T
from the client just as you did with V
, but to explicitly bring it back, use gather
:
>> S = gather(T);
Note that S
is now a 6x6 double, not a distributed array.
To 1.) There are surely other minor discrepancies, but at least the way you index and manipulate its elements should be the same.
To 2.) You could easily try out for yourself. Anyway, the result is a Composite, that is the normal array way copied to each worker in the execution of the spmd
block and the calculation performed multiple times and each result stored. I would use "normal" type for constant input data (parameters) and distributed
for variables which are used for computating the output (and defines their size).
Example:
x = distributed(1:100); % variable, output will be calculated on -> distributed
a = 5; % amplitude (constant parameter -> "normal")
spmd
y = a * sin(x);
end
y
This also explains the purpose of distributed
: enable parallel calculation on a matrix.
To 3.: Distributed
means its elements are spread over workers. Codistributed
means its elements are also spread but in the same way to something that is also distributed
(which among others implies equal size). I guess (but are not sure) that the codistributed
property stays as long as the parallel pool stays open, but from outside the spmd block they can only be accessed as distributed
arrays.
The documentation says:
Codistributed arrays on workers that you create inside spmd statements
or from within task functions of communicating job can be accessed as
distributed arrays on the client.