Lately I have been trying to fit a lot of random effects models to relatively big datasets. Let’s say about 50,000 people (or more) observed at up to 25 time points. With such a large sample size, we include a lot of predictors that we’re adjusting for – maybe 50 or so fixed effects. I’m fitting the model to a binary outcome using lme4::glmer
in R, with random intercepts for each subject. I can't go into specifics on the data, but the basic format of the glmer
command I used was:
fit <- glmer(outcome ~ treatment + study_quarter + dd_quarter + (1|id),
family = "binomial", data = dat)
where both study_quarter
and dd_quarter
are factors with about 20 levels each.
When I try to fit this model in R, it runs for about 12-15 hours, and returns an error that it failed to converge. I did a bunch of troubleshooting (e.g., following these guidelines), with no improvement. And the convergence isn’t even close in the end (max gradient around 5-10, whereas the convergence criterion is 0.001 I think).
I then tried fitting the model in Stata, using the melogit command. The model fit in under 2 mins, with no convergence issues. The corresponding Stata command is
melogit outcome treatment i.study_quarter i.dd_quarter || id:
What gives? Does Stata just have a better fitting algorithm, or one better optimized for large models and large datasets? It’s really surprising how different the run times were.
The
glmer
fit will probably be much faster with the optional argumentnAGQ=0L
. You have many fixed-effects parameters (20 levels for each ofstudy_quarter
anddd_quarter
generate a total of 28 contrasts) and the default optimization method (corresponding tonAGQ=1L
) puts all of those coefficients into the general nonlinear optimization call. WithnAGQ=0L
these coefficients are optimized within the much faster penalized iteratively reweighted least squares (PIRLS) algorithm. The default will generally provide a better estimate in the sense that the deviance at the estimate is lower, but the difference is usually very small and the time difference is enormous.I have a write-up of the differences in these algorithms as a
Jupyter
notebooknAGQ.ipynb
. That writeup uses theMixedModels
package forJulia
instead oflme4
but the methods are similar. (I am one of the authors oflme4
and the author ofMixedModels
.)If you are going to be doing a lot of GLMM fits I would consider doing so in
Julia
withMixedModels
. It is often much faster thanR
, even with all the complicated code inlme4
.Are you sure that Stata is reading in the whole file?
http://www.stata.com/manuals13/rlimits.pdf
The reason I ask is because it sounds to me like you've got 50k persons observed 25 times (1,250,000 rows); depending on the version of Stata you're using you could be getting truncated results.
EDIT Since it's not a file length issue have you tried the other packages for mixed effects like nlme? I suspect that the non-linear mixed effects model would take your data somewhat faster.
EDIT This resource may be more helpful than anything about the different models: https://stats.stackexchange.com/questions/173813/r-mixed-models-lme-lmer-or-both-which-one-is-relevant-for-my-data-and-why