`tapply()` to return data frame

2019-04-11 17:57发布

I have a dataset with a datetime (POSIXct), a "node" (factor) and and a "c" (numeric) columns, for example:

                 date node           c
1 2011-08-14 10:30:00    2 0.051236000
2 2011-08-14 10:30:00    2 0.081230000
3 2011-08-14 10:31:00    1 0.000000000
4 2011-08-14 10:31:00    4 0.001356337
5 2011-08-14 10:31:00    3 0.001356337
6 2011-08-14 10:32:00    2 0.000000000

I need to take the mean of column "c" for all pairs of "date" and "node", so I did this:

tapply(data$c, list(data$node, data$date), mean)

The result I obtain is what I want, but in a strange structure:

num [1:5, 1:8923] 0 0 0.00092 0.00146 NA ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:5] "1" "2" "3" "4" ...
  ..$ : chr [1:8923] "2011-08-14 10:30:00" "2011-08-14 10:31:00" "2011-08-14 10:32:00" "2011-08-14 10:33:00" ...

Where an example output would be:

  2011-08-17 23:56:00 2011-08-17 23:57:00 2011-08-17 23:58:00
1        4.759077e-05        4.759077e-05        4.759077e-05
2        0.000000e+00        3.875248e-05        1.595690e-04
3        1.134391e-03        1.134391e-03        1.109730e-03
4        4.882813e-04        6.914658e-04        4.955846e-04
5        0.000000e+00        0.000000e+00        0.000000e+00

What I was going for was something like the original structure, with a datetime, the node factor and the "c" value. I cannot figure out how to achieve this. Any help would be appreciated.

Many thanks.

标签: r apply
3条回答
虎瘦雄心在
2楼-- · 2019-04-11 18:37

You might try...

aggregate( c ~ node + date, data = data, FUN = mean )
查看更多
来,给爷笑一个
3楼-- · 2019-04-11 18:46

If you want output that's a data frame with three columns, you probably would benefit from looking at the plyr package (assuming your data are stored in dat):

library(plyr)
ddply(dat,.(date,node),summarise,m = mean(c))
查看更多
再贱就再见
4楼-- · 2019-04-11 18:46

Instead of tapply you want to use ave

data$grp.mean <- ave(data$c, list(data$node, data$date), FUN= mean)

Looking again at this I am wondering if you wanted to have the aggregation done on the basis of "date" in the calendar sense of 24 hours?

If you wanted to use the results you already have (assuming they are named "M") you might want to try :

require(reshape2)
newdf <- melt(t(M))
查看更多
登录 后发表回答