I want to create a matrix (A) where its elements are the average of every four rows of another matrix (B). For example, the elements of row 1 in matrix A should be the averages of row 1 to 4 in matrix B. Currently I have used a loop function to get that but the size of the matrices are so large and that makes the loop very time consuming. I wonder if there is a better way to do that. Here is an example
B = matrix(runif(10000, 0, 10), 100, 100)
A = matrix(0, floor(dim(B)[1]/4), dim(B)[2])
for (im in 1: floor(dim(B)[1]/4)){
A[im, ] = colMeans(as.matrix(B[c((((im - 1)*4) + 1):(im*4)), ]))
}
You can achieve this with the following package (zoo) and function (rollapply).
You could vectorize this pretty easily using the
rowsum
function which has amatrix
method (its' default) and can calculate sums by group. Then, just divide by 4 in order to get the meansBenchmarks
Since this is an optimization question here are some benchmarks with all the proposed methods on not such a big data set
Turns out OPs solution isn't so bad after all.
aggregate
can do this too, but requires subsequent coercion to amatrix
:Note that if
nrow(B)
isn't a multiple of 4, the result will include a final row that contains the column averages of the lastnrow(B) %% 4
rows.As indicated by @thelatemail,
tapply
can do a neater job of this: