Get the means of sub groups of means in R

2019-01-15 17:57发布

问题:

I'm a newbie of R and I don't know how to get R calculate the means of a subgroups of means which are the means of a subgroup themselves. I'll explain clearer.

I have a data frame like this:

GROUP WORD WLN
1     1    4
1     1    3
1     1    3
1     2    2
1     2    2
1     2    3
2     3    1
2     3    1
2     3    2
2     4    1
2     4    1
2     4    1
...   ...  ...

but the real one has a total of 5 groups and 25 words (5 words each group; every word has being assigned a number from 1 to 4 by 5 subjects...).

I need to get the means of WLN for every word and I can do that easily with a loop and save the results in a vector; but then I need a vector with the means of these means according to the group which the words belong to... So I need the means of means of words of the group 1, then of group 2, etc... (I don't know if I'm making it clear).

How can I get this without doing it one group by one?

回答1:

With base, using aggregate

> aggregate(WLN~GROUP+WORD, mean, data=df)
  GROUP WORD      WLN
1     1    1 3.333333
2     1    2 2.333333
3     2    3 1.333333
4     2    4 1.000000

where df is @Metrics' data.

Another alternative is using summaryBy from doBy package

> library(doBy)
> summaryBy(WLN~GROUP+WORD, FUN=mean, data=df)
  GROUP WORD WLN.mean
1     1    1 3.333333
2     1    2 2.333333
3     2    3 1.333333
4     2    4 1.000000


回答2:

Assume df is your dataframe:

df<-structure(list(GROUP = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L), WORD = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 
4L, 4L), WLN = c(4L, 3L, 3L, 2L, 2L, 3L, 1L, 1L, 2L, 1L, 1L, 
1L)), .Names = c("GROUP", "WORD", "WLN"), class = "data.frame", row.names = c(NA, 
-12L))

Plyr solution

install.packages("plyr")
library(plyr)
ddply(df,.(GROUP,WORD),summarize, meanwln=mean(WLN))
 GROUP WORD  meanwln
1     1    1 3.333333
2     1    2 2.333333
3     2    3 1.333333
4     2    4 1.000000

Data.table solution:

install.packages("data.table")
library(data.table)
df<-data.table(df)
setkey(df,GROUP,WORD)
df[,list(meanwln=mean(WLN)),by="GROUP,WORD"]

 GROUP WORD  meanwln
1:     1    1 3.333333
2:     1    2 2.333333
3:     2    3 1.333333
4:     2    4 1.000000


回答3:

with base:

with(df,tapply(WLN,list(GROUP,WORD),mean))

Edit:

If you also want row- and colmeans for the table above, you could do something like this:

x <- with(df,tapply(WLN,list(GROUP,WORD),mean))
addmargins(x, margin = seq_along(dim(x)), FUN = mean, quiet = TRUE)


回答4:

And now dplyr is even better...

require(dplyr)
tmp <- group_by(df, WORD)
df1 <- summarise(tmp, 
   count = n(), 
   mWLN = mean(WLN, na.rm = TRUE))
df1