Matlab matching first column of a row as index and

2019-04-16 18:45发布

问题:

I need help with taking the following data which is organized in a large matrix and averaging all of the values that have a matching ID (index) and outputting another matrix with just the ID and the averaged value that trail it.

File with data format:
(This is the StarData variable)
ID>>>>Values

002141865 3.867144e-03  742.000000  0.001121  16.155089  6.297494  0.001677

002141865 5.429278e-03  1940.000000  0.000477  16.583748  11.945627  0.001622

002141865 4.360715e-03  1897.000000  0.000667  16.863406  13.438383  0.001460

002141865 3.972467e-03  2127.000000  0.000459  16.103060  21.966853  0.001196

002141865 8.542932e-03  2094.000000  0.000421  17.452007  18.067214  0.002490

Do not be mislead by the examples I posted, that first number is repeated for about 15 lines then the ID changes and that goes for an entire set of different ID's, then they are repeated as a whole group again, think first block of code = [1 2 3; 1 5 9; 2 5 7; 2 4 6] then the code repeats with different values for the columns except for the index. The main difference is the values trailing the ID which I need to average out in matlab and output a clean matrix with only one of each ID fully averaged for all occurrences of that ID. Thanks for any help given.

回答1:

A modification of this answer does the job, as follows:

[value_sort ind_sort] = sort(StarData(:,1));
[~, ii, jj] = unique(value_sort);
n = diff([0; ii]);
averages = NaN(length(n),size(StarData,2)); % preallocate
averages(:,1) = StarData(ii,1);
for col = 2:size(StarData,2)
  averages(:,col) = accumarray(jj,StarData(ind_sort,col))./n;
end

The result is in variable averages. Its first column contains the values used as indices, and each subsequent column contains the average for that column according to the index value.

Compatibility issues for Matlab 2013a onwards:

The function unique has changed in Matlab 2013a. For that version onwards, add 'legacy' flag to unique, i.e. replace second line by

[~, ii, jj] = unique(value_sort,'legacy')