I have an array that is 13867 X 2 elements and stored in variable called "data". So, I want to do the following in Matlab:
- Average (row 1 -> row 21) ; i.e. take the average of first 21 elements
- Average (row 22 -> row 43) ; i.e. take the average of next 22 elements
- Average (row 44 -> row 64); i.e. take the average of the next 21 elements
- Average (row 65 -> row 86); i.e. take the average of the next 22 elements
- Repeat the process until the end of the matrix, so that we take the average of the last 21 elements from row 13847 to row 13876. I want the average of elements in column 1 and also column 2. I have somehow managed to do that in Excel, but it is a bit cumbersome task (had to create an index for the rows first). I guess at the end we will get 645 averages.
The key to doing that is inserting NaN
rows to make the shorter blocks (21 rows) the same size as the longer blocks (22 rows). This is very easy, using the insertrows
function from Matlab FileExchange:
n = 21;
m = 22;
dataPad = insertrows(data, nan(1,size(data,2)), n:(n+m):size(data,1));
After that, row 22 will be [NaN, NaN]
, row 66 will be [NaN, NaN]
, and so on. Now it gets very easy to calculate the mean. Simply reshape this matrix so that all values which should be averaged are on the same column. Finally, use the nanmean
function (mean function which simply ignores NaN
) to get the result.
It is not 100% clear to me, whether the result should be 645x2 or 645x1, i.e. whether to average over the rows as well, or not. Here would be the corresponding reshape
's for both ways:
1. Averaging over the rows too:
dataPadRearr = reshape(dataPad.',m*size(data,2),[]);
result = nanmean(dataPadRearr,1);
2. Leaving the rows alone:
dataPadRearr = reshape(dataPad,m,[],size(data,2));
result = squeeze(nanmean(dataPadRearr,1));
Note that here, you'll need a final squeeze
, as the result of nanmean
would be of dimension 1x645x2
, which is not very practical. squeeze
just removes this singleton dimension.
Here's one way to solve it with some padding with NaNs
, reshaping and concatenations -
%// Input
A = rand(13867,2);
%// Two stepsizes
m = 21;
n = 22;
%// Combined stepsize
N = m+n;
%// Pad with NaNs to simplify reshaping & finding averages with nanmean
Apad = cat(1,A,nan(N*ceil(numel(A)/(2*N)) - numel(A)/2,2));
%// Reshape into a 3D array with Combined stepsize number of rows
B = reshape(Apad,N,numel(Apad)/(2*N),[]);
%// Index into first m rows and get nan ignored averages row-wise.
%// Reshape into rows x 2 sized array
C = reshape(cat(1,nanmean(B(1:m,:,:),1),nanmean(B(m+1:end,:,:),1)),[],2);
%// Ignore NaNs and thus have the final output
out = reshape(C(~isnan(C)),[],2);
Verify output
First five rows -
>> out(1:4,:)
ans =
0.55694 0.55289
0.49942 0.53502
0.57768 0.40828
0.6347 0.45194
>> mean(A(1:21,:),1)
ans =
0.55694 0.55289
>> mean(A(22:43,:),1)
ans =
0.49942 0.53502
>> mean(A(44:64,:),1)
ans =
0.57768 0.40828
>> mean(A(65:86,:),1)
ans =
0.6347 0.45194
Last row -
>> out(end,:)
ans =
0.44631 0.59432
>> mean(A(13847:13867,:),1)
ans =
0.44631 0.59432
Explanation with the help of a toy example
Sample used -
%// Input
A = rand(17,2)
%// Two stepsizes
m = 3;
n = 4;
1] Input :
A =
0.64775 0.30635
0.45092 0.50851
0.54701 0.51077
0.29632 0.81763
0.74469 0.79483
0.18896 0.64432
0.68678 0.37861
0.18351 0.81158
0.36848 0.53283
0.62562 0.35073
0.78023 0.939
0.081126 0.87594
0.92939 0.55016
0.77571 0.62248
0.48679 0.58704
0.43586 0.20774
0.44678 0.30125
2] Combine step-size :
N =
7
3] Pad with NaN filled rows such that the number of rows is a multiple of N
-
Apad =
0.64775 0.30635
0.45092 0.50851
0.54701 0.51077
0.29632 0.81763
0.74469 0.79483
0.18896 0.64432
0.68678 0.37861
0.18351 0.81158
0.36848 0.53283
0.62562 0.35073
0.78023 0.939
0.081126 0.87594
0.92939 0.55016
0.77571 0.62248
0.48679 0.58704
0.43586 0.20774
0.44678 0.30125
NaN NaN
NaN NaN
NaN NaN
NaN NaN
4] This part might be a bit tricky. Consider each column from Apad
is made into a 2D array, such that we would have N
elements per column, because the intention here is to get averages along each column after further slicing each column into two subgroups of first three rows and rest four rows from such a 3D array. So, with Apad
having 2 rows, we would have a 3D array with two 3D slices, such that the first 3D slice would be a reshaped version of first column in Apad
i.e. of Apad(:,1)
. Similarly, the second 3D slice corresponds to the second column in Apad
. Thus, the ressultant 3D array would be -
B(:,:,1) =
0.64775 0.18351 0.48679
0.45092 0.36848 0.43586
0.54701 0.62562 0.44678
0.29632 0.78023 NaN
0.74469 0.081126 NaN
0.18896 0.92939 NaN
0.68678 0.77571 NaN
B(:,:,2) =
0.30635 0.81158 0.58704
0.50851 0.53283 0.20774
0.51077 0.35073 0.30125
0.81763 0.939 NaN
0.79483 0.87594 NaN
0.64432 0.55016 NaN
0.37861 0.62248 NaN
5] Find mean/average along each column with nanmean(..,1)
ignoring the NaNs
-
>> nanmean(B(1:m,:,:),1)
ans(:,:,1) =
0.54856 0.39254 0.45648
ans(:,:,2) =
0.44188 0.56504 0.36534
>> nanmean(B(m+1:end,:,:),1)
ans(:,:,1) =
0.47919 0.64161 NaN
ans(:,:,2) =
0.65885 0.74689 NaN
6] Concatenate and reshape those averages into a 2D array -
C =
0.54856 0.44188
0.47919 0.65885
0.39254 0.56504
0.64161 0.74689
0.45648 0.36534
NaN NaN
7] Ignore NaN rows for final output -
out =
0.54856 0.44188
0.47919 0.65885
0.39254 0.56504
0.64161 0.74689
0.45648 0.36534