可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I want to get mean of var1 and var2 by group low and high.
How can I get mean of two variables each by group (low and high) ?
ID var1 var2 low high
1 1 6 0 1
2 2 7 0 1
3 3 8 1 0
4 4 9 1 0
5 5 10 0 1
回答1:
aggregate
does what you need, given the proper input.
To get the aggregate of multiple columns, you can cbind
them so that they are separate columns in the result:
aggregate(cbind(var1, var2) ~ low+high, data=x, FUN=mean)
## low high var1 var2
## 1 1 0 3.500000 8.500000
## 2 0 1 2.666667 7.666667
If you want to take the mean of every column other than low
and high
, .
is handy, meaning "all other columns":
aggregate(. ~ low+high, data=x, FUN=mean)
## low high ID var1 var2
## 1 1 0 3.500000 3.500000 8.500000
## 2 0 1 2.666667 2.666667 7.666667
Note that +
has a special meaning in the formula if it is on the right side of the ~
. It doesn't mean a sum, but it means using both factors. On the left side, it means addition.
回答2:
A dplyr
solution:
ID<-c(1:5)
var1<-c(1:5)
var2<-c(6:10)
low<-c(0,0,1,1,0)
high<-c(1,1,0,0,1)
mydf<-data.frame(ID,var1,var2,low,high)
library(dplyr)
mydf %>%
group_by(low, high) %>%
summarise(mean_var1=mean(var1), mean_var2=mean(var2))
which gives you:
low high mean_var1 mean_var2
1 0 1 2.666667 7.666667
2 1 0 3.500000 8.500000
as Richard Scriven points out, you might be talking about the sum of var 1 and var 2 that you want to mean, in which case:
library(dplyr)
mydf %>%
mutate(sum_vars=var1+var2) %>%
group_by(low, high) %>%
summarise(mean_sumvars=mean(sum_vars))
low high mean_sumvars
1 0 1 10.33333
2 1 0 12.00000
回答3:
Here is an option using data.table
library(data.table)
setDT(df1)[, lapply(.SD, mean) ,.(low, high), .SDcols = var1:var2]
# low high var1 var2
#1: 0 1 2.666667 7.666667
#2: 1 0 3.500000 8.500000
and for the second case
setDT(df1)[, .(sumvars = Reduce(`+`, lapply(.SD, mean))) ,.(low, high), .SDcols = var1:var2]
# low high sumvars
#1: 0 1 10.33333
#2: 1 0 12.00000
回答4:
For individual variables, tapply is also very convenient, especially if multiple groups are there:
> with (dat, tapply(var1, list(low, high), mean))
0 1
0 NA 2.666667
1 3.5 NA
>
>
> with (dat, tapply(var2, list(low, high), mean))
0 1
0 NA 7.666667
1 8.5 NA
>
>
> with (dat, tapply(var1+var2, list(low, high), mean))
0 1
0 NA 10.33333
1 12 NA
>