Why does R update function changes glmer fit when

2019-07-18 08:23发布

问题:

Below is a simplified code reproducing what looks to me like a problem with R update function:

library('lme4')
f <- function(formula) {
  data <- data.frame(a = c(4, 5), rowi = c(1, 2), b = c(2, 2))
  fit0 <- glmer(formula, data = data, family = poisson(log))
  fit1 <- update(fit0)
  cat('f likelihoods: ', logLik(fit0), logLik(fit1), '\n')
}
g <- function() {
  f(a ~ -1 + (1|rowi) + offset(b))
  data <- data.frame(a = c(4, 5), rowi = c(1, 2), b = c(20, 40))
  f(a ~ -1 + (1|rowi) + offset(b))
  cat('g likelihood: ', logLik(glmer(a ~ -1 + (1|rowi) + offset(b),
      data = data, family = poisson(log))), '\n')
}
g()
data <- data.frame(a = c(4, 5), rowi = c(1, 2), b = c(50, 80))
g()
cat('global likelihood: ', logLik(glmer(a ~ -1 + (1|rowi) + offset(b),
    data = data, family = poisson(log))), '\n')

This code outputs:

f likelihoods:  -4.712647 -4.712647 
f likelihoods:  -4.712647 -12.6914 
g likelihood:  -12.6914 
f likelihoods:  -4.712647 -14.22997 
f likelihoods:  -4.712647 -12.6914 
g likelihood:  -12.6914 
global likelihood:  -14.22997 

The surprising (to me) thing is that update(fit0) operation changes the model when data is defined in the environment of the formula. Why is that? How to use update properly to avoid pitfalls like this?

回答1:

I've run into this, too. The short answer is that update.merMod(model) employs environment(formula(model)) to determine which environment to use when refitting the model (if that fails, then it'll try the enclosing environment, and so on). The result is that update() refits the model using the environment that the formula was created in, not the environment that the original merMod object was created in. This is consistent with the behavior of the example you cooked up.

My clumsy way around this issue would be to pass the formulas around as strings, and be sure to convert to formulas inside the same function body where the model is originally fit; e.g.

f <- function(formula_string) {
  formula <- as.formula(formula_string)
  data <- data.frame(a = c(4, 5), rowi = c(1, 2), b = c(2, 2))
  fit0 <- glmer(formula, data = data, family = poisson(log))
  fit1 <- update(fit0)
  cat('f likelihoods: ', logLik(fit0), logLik(fit1), '\n')
}
g <- function() {
  f("a ~ -1 + (1|rowi) + offset(b)")
  data <- data.frame(a = c(4, 5), rowi = c(1, 2), b = c(20, 40))
  f("a ~ -1 + (1|rowi) + offset(b)")
  cat('g likelihood: ', logLik(glmer(a ~ -1 + (1|rowi) + offset(b),
      data = data, family = poisson(log))), '\n')
}

I'm not sure if the current behavior is desirable for some reason (that's a question for @benbolker and the other lme4 developers), or what a lower-level fix might look like ... aside from either explicitly setting/saving the environment of the merMod object at creation, or a recursive frame search that uses identical() (ala where() in pryr). There are probably good arguments against these.



标签: r scope