Below is a simplified code reproducing what looks to me like a problem with R update
function:
library('lme4')
f <- function(formula) {
data <- data.frame(a = c(4, 5), rowi = c(1, 2), b = c(2, 2))
fit0 <- glmer(formula, data = data, family = poisson(log))
fit1 <- update(fit0)
cat('f likelihoods: ', logLik(fit0), logLik(fit1), '\n')
}
g <- function() {
f(a ~ -1 + (1|rowi) + offset(b))
data <- data.frame(a = c(4, 5), rowi = c(1, 2), b = c(20, 40))
f(a ~ -1 + (1|rowi) + offset(b))
cat('g likelihood: ', logLik(glmer(a ~ -1 + (1|rowi) + offset(b),
data = data, family = poisson(log))), '\n')
}
g()
data <- data.frame(a = c(4, 5), rowi = c(1, 2), b = c(50, 80))
g()
cat('global likelihood: ', logLik(glmer(a ~ -1 + (1|rowi) + offset(b),
data = data, family = poisson(log))), '\n')
This code outputs:
f likelihoods: -4.712647 -4.712647
f likelihoods: -4.712647 -12.6914
g likelihood: -12.6914
f likelihoods: -4.712647 -14.22997
f likelihoods: -4.712647 -12.6914
g likelihood: -12.6914
global likelihood: -14.22997
The surprising (to me) thing is that update(fit0)
operation changes the model when data
is defined in the environment of the formula. Why is that? How to use update
properly to avoid pitfalls like this?
I've run into this, too. The short answer is that update.merMod(model)
employs environment(formula(model))
to determine which environment to use when refitting the model (if that fails, then it'll try the enclosing environment, and so on). The result is that update()
refits the model using the environment that the formula was created in, not the environment that the original merMod
object was created in. This is consistent with the behavior of the example you cooked up.
My clumsy way around this issue would be to pass the formulas around as strings, and be sure to convert to formulas inside the same function body where the model is originally fit; e.g.
f <- function(formula_string) {
formula <- as.formula(formula_string)
data <- data.frame(a = c(4, 5), rowi = c(1, 2), b = c(2, 2))
fit0 <- glmer(formula, data = data, family = poisson(log))
fit1 <- update(fit0)
cat('f likelihoods: ', logLik(fit0), logLik(fit1), '\n')
}
g <- function() {
f("a ~ -1 + (1|rowi) + offset(b)")
data <- data.frame(a = c(4, 5), rowi = c(1, 2), b = c(20, 40))
f("a ~ -1 + (1|rowi) + offset(b)")
cat('g likelihood: ', logLik(glmer(a ~ -1 + (1|rowi) + offset(b),
data = data, family = poisson(log))), '\n')
}
I'm not sure if the current behavior is desirable for some reason (that's a question for @benbolker and the other lme4
developers), or what a lower-level fix might look like ... aside from either explicitly setting/saving the environment of the merMod
object at creation, or a recursive frame search that uses identical()
(ala where()
in pryr
). There are probably good arguments against these.