Stata has a very nice command, egen
, which makes it easy to compute statistics over group of observation. For instance, it is possible to compute the max, the mean and the min for each group and add them as a variable in the detailed data set. The Stata command is one line of code :
by group : egen max = max(x)
I've never found the same command in R. summarise
in the dplyr
package makes it easy to compute statistics for each group but then I have to run a loop to associate the statistic to each observation :
library("dplyr")
N <- 1000
tf <- data.frame(group = sample(1:100, size = N, replace = TRUE), x = rnorm(N))
table(tf$group)
mtf <- summarise(group_by(tbl_df(tf), group), max = max(x))
tf$max <- NA
for (i in 1:nrow(mtf)) {
tf$max[tf$group == mtf$group[i]] <- mtf$max[i]
}
Does any one has a better solution ?
Here are a few approaches:
dplyr
ave
This uses only the base of R:
data.table