Matlab: avoid for loops to find the maximum among

2019-02-26 22:28发布

问题:

I need to find the maximum among values with same labels, in matlab, and I am trying to avoid using for loops.

Specifically, I have an array L of labels and an array V of values, same size. I need to produce an array S which contains, for each value of L, the maximum value of V. An example will explain better:

L = [1,1,1,2,2,2,3,3,3,4,4,4,1,2,3,4]

V = [5,4,3,2,1,2,3,4,5,6,7,8,9,8,7,6]

Then, the values of the output array S will be:

s(1) = 9 (the values V(i) such that L(i) == 1 are: 5,4,3,9 -> max = 9)

s(2) = 8 (the values V(i) such that L(i) == 2 are: 2,1,2,8 -> max = 8)

s(3) = 7 (the values V(i) such that L(i) == 3 are: 3,4,5,7 -> max = 7)

s(4) = 8 (the values V(i) such that L(i) == 4 are: 6,7,8,6 -> max = 8)

this can be trivially implemented by traversing the arrays L and V with a for loop, but in Matlab for loops are slow, so I was looking for a faster solution. Any idea?

回答1:

This is a standard job for accumarray.

Three cases need to be considered, with increasing generality:

  • Integer labels.
  • Integer labels, specify fill value.
  • Remove gaps; or non-integer labels. General case.

Integer labels

You can just use

S = accumarray(L(:), V(:), [], @max).';

In your example, this gives

>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7];
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6];
>> S = accumarray(L(:), V(:), [], @max).'
S =
     9     8     7     8

Integer labels, specify fill value

If there are gaps between integers in L, the above will give a 0 result for the non-existing labels. If you want to change that fill value (for example to NaN), use a fifth input argument in acccumarray:

S = accumarray(L(:), V(:), [], @max, NaN).';

Example:

>> L = [1 1 1 2 2 2 3 3 3 4 4 4 1 2 3 7]; %// last element changed
>> V = [5 4 3 2 1 2 3 4 5 6 7 8 9 8 7 6]; %// same as in your example
>> S = accumarray(L(:), V(:), [], @max, NaN).'
S =
     9     8     7     8   NaN   NaN     6

Remove gaps; or non-integer labels. General case

When the gaps between integer labels are large, using a fill value may be inefficient. In that case you may want to get only the meaningful values in S, without fill values, i.e.skip non-existing labels. Also, it may be the case that L doesn't necessarily contain integers.

These two issues are solved by applying unique to the labels before using accumarray:

[~, ~, Li] = unique(L); %// transform L into consecutive integers
S = accumarray(Li(:), V(:), [], @max, NaN).';

Example:

>> L = [1.5 1.5 1.5 2 2 2 3 3 3 4 4 4 1 2 3 7.8]; %// note: non-integer values
>> V = [5   4   3   2 1 2 3 4 5 6 7 8 9 8 7 6  ]; %// same as in your example
>> [~, ~, Li] = unique(L); %// transform L into consecutive integers
>> S = accumarray(Li(:), V(:), [], @max, NaN).'
S =
     9     5     8     7     8     6


回答2:

helper=[L.', V.'];
helper=sortrows(helper,-2);
[~,idx,~]=unique(helper(:,1));
S=helper(idx,2);

What I do is: I join the two arrays as columns. Then I sort them regarding second column with biggest element first. Then I get the idx of the unique Values in L before I return the corresponding Values from V.

The solution from Luis Mendo is faster. But as far as I see his solution doesn't work if there is a zero,negative value or a noninteger inside L:

Luis solution: Elapsed time is 0.722189 seconds.
My solution: Elapsed time is 2.575943 seconds.

I used:

L= ceil(rand(1,500)*10);
V= ceil(rand(1,500)*250);

and ran the code 10000 times.