In accumarray()
the first note about 'subs', first appeared in the MATLAB R14sp3 docs, says:
Note If the subscripts in subs are not sorted, fun should not depend on the order of the values in its input data.
It is not clear to me what is considered to be sorted. Suppose:
subs = [1 1
2 1
1 2
2 2];
- shall
subs
be sorted in the sense issorted(subs,'rows')
, or ...
- in the linear indexing sense, i.e.
issorted(sub2ind([2 2],subs(:,1), subs(:,2)))
I would like to rely on:
accumarray(subs,val,[], @(x) x(end))
If somebody could also provide examples/tests from older releases (to check for backward compatibility) where e.g. 1) is false while 2) is true, that would be great.
PS. I am mostly not interested in alternatives to accumarray
, unless very succinct and using the same subs
and val
.
Ok I did some tests, and I think that "sorted" in the quote is meant in the linear indexing sense. (I'm on R2013a if that matters)
To understand how accumarray
calls the specified function, I'll use the trick of grouping the values into a cellarray by specifying fun = @(x) {x}
as the function to be applied.
1) 1D indices
First lets create some subscripts and values
N = 10; sz = 4;
subs = randi([1 sz], [N 1]);
vals = (1:N)'*100;
Now we call ACCUMARRAY on the non-sorted indices (multiple times)
C = cell(5,1);
for i=1:5
C{i} = accumarray(subs, vals, [], @(x){x});
end
The order of the values passed to the function is arbitrary but still consistent across the multiple runs:
>> assert(isequal(C{:}))
>> celldisp(C{1})
ans{1} =
800
900
700
ans{2} =
300
ans{3} =
1000
200
100
ans{4} =
400
600
500
This is why the documentation warns you that fun
should not depend on the order of the values passed to it.
Now if we sort the subscripts beforehand:
[~,ord] = sort(subs);
C = cell(5,1);
for i=1:5
C{i} = accumarray(subs(ord), vals(ord), [], @(x){x});
end
assert(isequal(C{:}))
celldisp(C{1})
we will see that the values are passed to the function sorted themselves:
ans{1} =
700
800
900
ans{2} =
300
ans{3} =
100
200
1000
ans{4} =
400
500
600
2) 2D indices
I've tried the same thing in the case of 2D subscript indices. First we start with random data:
%# some 2d subscripts and corresponding values
N = 10; sz = 2;
subs = randi([1 sz], [N 2]);
vals = (1:N)*100;
Here is the case for non-sorted indices:
C = cell(5,1);
for i=1:5
C{i} = accumarray(subs, vals, [], @(x){x});
end
assert(isequal(C{:}))
celldisp(C{1})
Here is when I tried "sorting by rows":
[~,ord] = sortrows(subs, [1 2]);
C = cell(5,1);
for i=1:5
C{i} = accumarray(subs(ord,:), vals(ord), [], @(x){x});
end
assert(isequal(C{:}))
celldisp(C{1})
And finally here is when we sort by "linear index":
[~,ord] = sort(sub2ind([sz sz], subs(:,1), subs(:,2)));
C = cell(5,1);
for i=1:5
C{i} = accumarray(subs(ord,:), vals(ord), [], @(x){x});
end
assert(isequal(C{:}))
celldisp(C{1})
I'll omit the long output, and report that only in the last case that the values were passed to the function ordered. So I conclude that the "sorted" criteria is based on the linear indices.
ans{1,1} =
[]
ans{2,1} =
200
600
700
ans{1,2} =
100
300
400
500
1000
ans{2,2} =
800
900
Bonus:
Try using the following function instead:
function out = fun(x)
out = {x};
disp('[')
disp(x)
disp(']')
end
you'll see that the function is called in an unpredictable way, even with some duplication! You might have to increase the size of the data to see such behavior...
As far as backward compatibility goes, the documentation mentions this note all the way back in MATLAB R14sp3 but not R14sp2. Given the fact that it's been documented since 2005, I'd say it is safe to rely on this feature (I doubt anyone using such old versions expects new code to just work anyway!)