I'm a newbie of R and I don't know how to get R calculate the means of a subgroups of means which are the means of a subgroup themselves. I'll explain clearer.
I have a data frame like this:
GROUP WORD WLN
1 1 4
1 1 3
1 1 3
1 2 2
1 2 2
1 2 3
2 3 1
2 3 1
2 3 2
2 4 1
2 4 1
2 4 1
... ... ...
but the real one has a total of 5 groups and 25 words (5 words each group; every word has being assigned a number from 1 to 4 by 5 subjects...).
I need to get the means of WLN for every word and I can do that easily with a loop and save the results in a vector; but then I need a vector with the means of these means according to the group which the words belong to... So I need the means of means of words of the group 1, then of group 2, etc... (I don't know if I'm making it clear).
How can I get this without doing it one group by one?
With base, using aggregate
> aggregate(WLN~GROUP+WORD, mean, data=df)
GROUP WORD WLN
1 1 1 3.333333
2 1 2 2.333333
3 2 3 1.333333
4 2 4 1.000000
where df
is @Metrics' data.
Another alternative is using summaryBy
from doBy package
> library(doBy)
> summaryBy(WLN~GROUP+WORD, FUN=mean, data=df)
GROUP WORD WLN.mean
1 1 1 3.333333
2 1 2 2.333333
3 2 3 1.333333
4 2 4 1.000000
Assume df is your dataframe:
df<-structure(list(GROUP = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L), WORD = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
4L, 4L), WLN = c(4L, 3L, 3L, 2L, 2L, 3L, 1L, 1L, 2L, 1L, 1L,
1L)), .Names = c("GROUP", "WORD", "WLN"), class = "data.frame", row.names = c(NA,
-12L))
Plyr solution
install.packages("plyr")
library(plyr)
ddply(df,.(GROUP,WORD),summarize, meanwln=mean(WLN))
GROUP WORD meanwln
1 1 1 3.333333
2 1 2 2.333333
3 2 3 1.333333
4 2 4 1.000000
Data.table solution:
install.packages("data.table")
library(data.table)
df<-data.table(df)
setkey(df,GROUP,WORD)
df[,list(meanwln=mean(WLN)),by="GROUP,WORD"]
GROUP WORD meanwln
1: 1 1 3.333333
2: 1 2 2.333333
3: 2 3 1.333333
4: 2 4 1.000000
with base:
with(df,tapply(WLN,list(GROUP,WORD),mean))
Edit:
If you also want row- and colmeans for the table above, you could do something like this:
x <- with(df,tapply(WLN,list(GROUP,WORD),mean))
addmargins(x, margin = seq_along(dim(x)), FUN = mean, quiet = TRUE)
And now dplyr is even better...
require(dplyr)
tmp <- group_by(df, WORD)
df1 <- summarise(tmp,
count = n(),
mWLN = mean(WLN, na.rm = TRUE))
df1