I would like to share some of my thoughts when trying to improve the model fitting time of a linear mixed effects model in R
using the lme4
package.
Dataset Size: The dataset consists, approximately, of 400.000 rows and 32 columns. Unfortunately, no information can be shared about the nature of the data.
Assumptions and Checks: It is assumed that the response variable comes from a Normal distribution. Prior to the model fitting process, variables were tested for collinearity and multicollinearity using correlation tables and the alias
function provided in R.
Continuous variables were scaled in order to help convergence.
Model Structure: The model equation contains 31 fixed effects (including intercept) and 30 random effects (intercept is not included). Random effects are randomized for a specific factor variable that has 2700 levels. The covariance structure is Variance Components as it is assumed that there is independency between random effects.
Model equation example:
lmer(Response ~ 1 + Var1 + Var2 + ... + Var30 + (Var1-1| Group) + (Var2-1| Group) + ... + (Var30-1| Group), data=data, REML=TRUE)
Model was fitted successfully, however, it took about 3,1 hours to provide results. The same model in SAS took a few seconds. There is available literature on the web on how to reduce time by using the non-linear optimization algorithm nloptwrap
and turnining off the time consuming derivative calculation that is performed after the optmization is finished calc.derivs = FALSE
:
https://cran.r-project.org/web/packages/lme4/vignettes/lmerperf.html
Time was reduced by 78%.
Question: Is there any other alternative way to reduce the model fitting time by defining the lmer
parameter inputs accordingly? There is so much difference between R and SAS in terms of model fitting time.
Any suggestion is appreciated.