Fun depending on order of subs and values in accum

2020-02-01 12:28发布

问题:

In accumarray() the first note about 'subs', first appeared in the MATLAB R14sp3 docs, says:

Note If the subscripts in subs are not sorted, fun should not depend on the order of the values in its input data.

It is not clear to me what is considered to be sorted. Suppose:

subs = [1 1
        2 1
        1 2
        2 2];
  1. shall subs be sorted in the sense issorted(subs,'rows'), or ...
  2. in the linear indexing sense, i.e. issorted(sub2ind([2 2],subs(:,1), subs(:,2)))

I would like to rely on:

accumarray(subs,val,[], @(x) x(end))

If somebody could also provide examples/tests from older releases (to check for backward compatibility) where e.g. 1) is false while 2) is true, that would be great.

PS. I am mostly not interested in alternatives to accumarray, unless very succinct and using the same subs and val.

回答1:

Ok I did some tests, and I think that "sorted" in the quote is meant in the linear indexing sense. (I'm on R2013a if that matters)

To understand how accumarray calls the specified function, I'll use the trick of grouping the values into a cellarray by specifying fun = @(x) {x} as the function to be applied.

1) 1D indices

First lets create some subscripts and values

N = 10; sz = 4;
subs = randi([1 sz], [N 1]);
vals = (1:N)'*100;

Now we call ACCUMARRAY on the non-sorted indices (multiple times)

C = cell(5,1);
for i=1:5
    C{i} = accumarray(subs, vals, [], @(x){x});
end

The order of the values passed to the function is arbitrary but still consistent across the multiple runs:

>> assert(isequal(C{:}))
>> celldisp(C{1})
ans{1} =
   800
   900
   700
ans{2} =
   300
ans{3} =
        1000
         200
         100
ans{4} =
   400
   600
   500

This is why the documentation warns you that fun should not depend on the order of the values passed to it.

Now if we sort the subscripts beforehand:

[~,ord] = sort(subs);
C = cell(5,1);
for i=1:5
    C{i} = accumarray(subs(ord), vals(ord), [], @(x){x});
end
assert(isequal(C{:}))
celldisp(C{1})

we will see that the values are passed to the function sorted themselves:

ans{1} =
   700
   800
   900
ans{2} =
   300
ans{3} =
         100
         200
        1000
ans{4} =
   400
   500
   600

2) 2D indices

I've tried the same thing in the case of 2D subscript indices. First we start with random data:

%# some 2d subscripts and corresponding values
N = 10; sz = 2;
subs = randi([1 sz], [N 2]);
vals = (1:N)*100;
  • Here is the case for non-sorted indices:

    C = cell(5,1);
    for i=1:5
        C{i} = accumarray(subs, vals, [], @(x){x});
    end
    assert(isequal(C{:}))
    celldisp(C{1})
    
  • Here is when I tried "sorting by rows":

    [~,ord] = sortrows(subs, [1 2]);
    C = cell(5,1);
    for i=1:5
        C{i} = accumarray(subs(ord,:), vals(ord), [], @(x){x});
    end
    assert(isequal(C{:}))
    celldisp(C{1})
    
  • And finally here is when we sort by "linear index":

    [~,ord] = sort(sub2ind([sz sz], subs(:,1), subs(:,2)));
    C = cell(5,1);
    for i=1:5
        C{i} = accumarray(subs(ord,:), vals(ord), [], @(x){x});
    end
    assert(isequal(C{:}))
    celldisp(C{1})
    

I'll omit the long output, and report that only in the last case that the values were passed to the function ordered. So I conclude that the "sorted" criteria is based on the linear indices.

ans{1,1} =
     []
ans{2,1} =
   200
   600
   700
ans{1,2} =
         100
         300
         400
         500
        1000
ans{2,2} =
   800
   900

Bonus:

Try using the following function instead:

function out = fun(x)
    out = {x};
    disp('[')
    disp(x)
    disp(']')
end

you'll see that the function is called in an unpredictable way, even with some duplication! You might have to increase the size of the data to see such behavior...


As far as backward compatibility goes, the documentation mentions this note all the way back in MATLAB R14sp3 but not R14sp2. Given the fact that it's been documented since 2005, I'd say it is safe to rely on this feature (I doubt anyone using such old versions expects new code to just work anyway!)