I have a large dataset which I would like to perform post hoc computation:
dat = as.data.frame(matrix(runif(10000*300), ncol = 10000, nrow = 300))
dat$group = rep(letters[1:3], 100)
Here is my code:
start <- Sys.time()
vars <- names(dat)[-ncol(dat)]
aov.out <- lapply(vars, function(x) {
lm(substitute(i ~ group, list(i = as.name(x))), data = dat)})
TukeyHSD.out <- lapply(aov.out, function(x) TukeyHSD(aov(x)))
Sys.time() - start
Time difference of 4.033335 mins
It takes about 4 min, are there more efficient and elegant ways to perform post hoc using R?
Thanks a lot
Your example is too big. For illustration of the idea I use a small one.
Why do you call
aov
on a fitted "lm" model? That basically refits the same model.Have a read on Fitting a linear model with multiple LHS first.
lm
is the workhorse ofaov
, so you can pass a multiple LHS formula toaov
. The model has classc("maov", "aov", "mlm", "lm")
.Now the issue is: there is no "maov" method for
TuckyHSD
. So we need a hacking.TuckyHSD
relies on the residuals of the fitted model. Inc("aov", "lm")
case the residuals is a vector, but inc("maov", "aov", "mlm", "lm")
case it is a matrix. The following demonstrates the hacking.I have used a "for" loop. Replace it with a
lapply
if you want.