The language is R.
I have an nxm matrix, and I'd like to partition it into 3x3 sections and calculate the mean (or any function) within each. (If there's a leftover bit that isn't 3x3 then use just what's left).
I'm sure there's an apply
-ish way to do this -- it's on the tip of my tongue -- but my brain is currently failing me.
I suppose it's a bit like a moving window question except I want non-overlapping windows (so it's easier).
Can anyone think of an inbuilt function that does this? Or a vectorised way?
Here's my loopy version:
winSize <- 3
mat <- matrix(runif(6*11),nrow=6,ncol=11)
nr <- nrow(mat)
nc <- ncol(mat)
outMat <- matrix(NA,nrow=ceiling(nr/winSize),
ncol=ceiling(nc/winSize))
FUN <- mean
for ( i in seq(1,nr,by=winSize) ) {
for ( j in seq(1,nc,by=winSize) ) {
# work out mean in 3x3 window, fancy footwork
# with pmin just to make sure we don't go out of bounds
outMat[ ceiling(i/winSize), ceiling(j/winSize) ] <-
FUN(mat[ pmin(i-1 + 1:winSize,nr), pmin(j-1 + 1:winSize,nc)])
}
}
cheers.
Just want to summarise the different methods for this.
First, @VincentZoonekynd's solution. This is very general -- it allows me to apply any function to my matrix. However it is a little slow because I am applying these to matrices of order ~5000x1000x3 and want back out a (5000/kernelSize) x (1000/kernelSize) x 3 image.
First, generate a matrix to test on (I made it smaller so as not to kill my computer whilst testing various methods):
Method 0: loop
Method 1: tapply
This gives:
Which are a bit slow, given that I would like to apply this to bigger
img
matrices.What surprised me (at first) was that METHOD 0 (loops) was faster than METHOD 1 (
tapply
).However, then I remembered that
tapply
has a reputation for being not much faster than an explicit loop (why is that? I remember reading it somewhere...the function code looks like it might do the for loop anyway, as opposed to calling external code).I also have this general feeling that
vapply
andsapply
are the fast versions ofapply
(again, not sure if this is definitively true but I've certainly found so).Method 2: vapply
So, I tried to rewrite my loopy version using
vapply
. (There's probably a better way to do handle the 3rd dimension, but oh well...). This basically generates a big list of coordinates intoimg
. The coordinates give the(i,j)
of the corner of eachkernelSize*kernelSize
square.Then
vapply
loops through them all and calculates the mean.This gives:
So, it is a bit faster than a loop to use
vapply
(I do feel like I'm not getting as much out ofvapply
as I could be though...like I'm not using it in the right way).Method 3: filter
This still isn't quite fast enough, so I then incorporated the information that I only want a mean in each window, and this is basically a convolution of
[ 1/3 1/3 1/3 ]
with the matrix in each dimension.This loses the general applicability of applying an arbitrary
FUN
but gets big speedups in return.Basically, I make a kernel
[1/3, 1/3, 1/3]
and convolve it withimg
twice, once in the x direction and once in the y. Then I only extract every 3rd value (since I wanted non-overlapping windows).This seems a bit wasteful to me in that I calculate the mean for every 3x3 window in my original matrix, instead of just non-overlapping windows, but I don't know how to tell R not to calculate those values that I'm going to throw away anyway.
However you have to take a bit of care at the borders -- say there's only a 2x2 patch left over, then the mean is over 4 instead of 9 values. My current code doesn't handle this, but I don't mind if it's just the border that's out, because I'm only doing the downsampling for display purposes.
(It would be nice to fix this one last thing though...)
This returns:
So this method is by far the quickest, and the error just along the last column of
out3
(once per channel), since (I guess) there are border conditions.You can use
row
andcol
to extract the row and column numbers, and then compute the coordinates of each block.Edit: This can be generalized to higher-dimensional arrays, by replacing
row
andcol
with the following function.