可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have the following matrix of depth and temperature data (855 rows, 2 col) and would like to take the mean of every 3 rows within each column. For example:
[1,] -6.7 18.91
[2,] -5.4 18.91
[3,] -4.0 18.59
[4,] -6.7 20.37
[5,] -6.7 20.05
[6,] -2.7 20.21
[7,] -4.0 21.03
[8,] -5.4 20.70
[9,] -4.0 20.87
[10,] -2.7 21.37
[11,] -2.7 21.37
[12,] -2.7 21.37
mean(data[1:3,1])
mean(data[4:6,1])
for the entire matrix. How can I accomplish this without manually writing the code for the mean of every 3 rows? Any ideas or suggestions are greatly appreciated.
回答1:
Use rollapply
function from zoo package. See ?rollapply
for more details.
library(zoo)
rollapply(matrix[,1], width=3, mean, by=3)
Example:
> set.seed(1)
> Data <- matrix(rnorm(30, 100, 50), ncol=2) # some random data
> rollapply(Data[,1], width=3, mean, by=3)
[1] 78.69268 118.40534 130.02559 126.60393 71.48317
> # you could check this out by doing some verification as in:
> mean(Data[1:3, 1])
[1] 78.69268
> mean(Data[4:6, 1])
[1] 118.4053
> mean(Data[7:9, 1]) # and so on ...
[1] 130.0256
If you want the mean for all columns in your matrix, then just add by.column=TRUE
in the rollapply
call:
> rollapply(Data, width=3, mean, by=3, by.colum=TRUE)
[,1] [,2]
[1,] 78.69268 114.71187
[2,] 118.40534 138.90166
[3,] 130.02559 81.12249
[4,] 126.60393 106.79836
[5,] 71.48317 74.48399
回答2:
Try to use tapply
and apply
:
R > f <- rep(c(1:3), each = 3)
R > f
[1] 1 1 1 2 2 2 3 3 3
R > x <- matrix(1:27, 9, 3)
R > x
[,1] [,2] [,3]
[1,] 1 10 19
[2,] 2 11 20
[3,] 3 12 21
[4,] 4 13 22
[5,] 5 14 23
[6,] 6 15 24
[7,] 7 16 25
[8,] 8 17 26
[9,] 9 18 27
R > apply(x, 2, function(t) tapply(t, f, mean))
[,1] [,2] [,3]
1 2 11 20
2 5 14 23
3 8 17 26
回答3:
I really like the 'rollapply' function for this, because its syntax closely matches what you're trying to do. However, I thought I would contribute, for posterity, how you would approach this problem with the 'plyr' package.
Note: You could do this all in one statement, but I've broken it up to make it easier to understand.
Step 1: Set up your data to have a sorting variable.
data.plyr <- data.frame(test, group=floor((1:nrow(test)-1)/3)+1)
I've just added a column 'group' that assigns a group number to every three columns. The two matrix columns are now 'X1' and 'X2' by default.
Step 2: Run the 'colMeans' function for each group.
library(plyr)
ddply(data.plyr, .(group), colMeans)
For this specific question, I think the 'plyr' package is sub-optimal, but it's worth noting the method for future reference. The 'apply' family and 'rollapply' functions work best with continuity and consistency in the data. In applications where you want more flexibility, the 'plyr' family functions are useful to have in your toolbox.