使用一个用户定义的函数内步骤()当缺少对象错误(Missing object error when

2019-09-03 22:17发布

5 days and still no answer

  • As can be seen by Simon's comment, this is a reproducible and very strange issue. It seems that the issue only arises when a stepwise regression with very high predictive power is wrapped in a function.

I have been struggling with this for a while and any help would be much appreciated. I am trying to write a function that runs several stepwise regressions and outputs all of them to a list. However, R is having trouble reading the dataset that I specify in my function arguments. I found several similar errors on various boards (here, here, and here), however none of them seemed to ever get resolved. It all comes down to some weird issues with calling step() in a user-defined function. I am using the following script to test my code. Run the whole thing several times until an error arises (trust me, it will):

test.df <- data.frame(a = sample(0:1, 100, rep = T),
                      b = as.factor(sample(0:5, 100, rep = T)),
                      c = runif(100, 0, 100),
                      d = rnorm(100, 50, 50))
test.df$b[10:100] <- test.df$a[10:100] #making sure that at least one of the variables has some predictive power

stepModel <- function(modeling.formula, dataset, outfile = NULL) {
  if (is.null(outfile) == FALSE){
    sink(file = outfile,
         append = TRUE, type = "output")
    print("")
    print("Models run at:")
    print(Sys.time())
  }
  model.initial <- glm(modeling.formula,
                       family = binomial,
                       data = dataset)
  model.stepwise1 <- step(model.initial, direction = "backward")
  model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
  output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
  sink()
  return(output)
}

blah <- stepModel(a~., dataset = test.df)

This returns the following error message (if the error does not show up right away, keep re-running the test.df script as well as the call for stepModel(), it will show up eventually):

Error in is.data.frame(data) : object 'dataset' not found

I have determined that everything runs fine up until model.stepwise2 starts to get built. Somehow, the temporary object 'dataset' works just fine for the first stepwise regression, but fails to be recognized by the second. I found this by commenting out part of the function as can be seen below. This code will run fine, proving that the object 'dataset' was originally being recognized:

stepModel1 <- function(modeling.formula, dataset, outfile = NULL) {
  if (is.null(outfile) == FALSE){
    sink(file = outfile,
         append = TRUE, type = "output")
    print("")
    print("Models run at:")
    print(Sys.time())
  }
  model.initial <- glm(modeling.formula,
                       family = binomial,
                       data = dataset)
  model.stepwise1 <- step(model.initial, direction = "backward")
#   model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
#   sink()
#   output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
  return(model.stepwise1)
}

blah1 <- stepModel1(a~., dataset = test.df) 

EDIT - before anyone asks, all the summary() functions were there because the full function (i edited it so that you could focus in on the error) has another piece that defines a file to which you can output stepwise trace. I just got rid of them

EDIT 2 - session info

sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] tcltk     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sqldf_0.4-6.4         RSQLite.extfuns_0.0.1 RSQLite_0.11.3        chron_2.3-43         
 [5] gsubfn_0.6-5          proto_0.3-10          DBI_0.2-6             ggplot2_0.9.3.1      
 [9] caret_5.15-61         reshape2_1.2.2        lattice_0.20-6        foreach_1.4.0        
[13] cluster_1.14.2        plyr_1.8             

loaded via a namespace (and not attached):
 [1] codetools_0.2-8    colorspace_1.2-1   dichromat_2.0-0    digest_0.6.2       grid_2.15.1       
 [6] gtable_0.1.2       iterators_1.0.6    labeling_0.1       MASS_7.3-18        munsell_0.4       
[11] RColorBrewer_1.0-5 scales_0.2.3       stringr_0.6.2      tools_2.15

EDIT 3 - this performs all the same operations as the function, just without using a function. This will run fine every time, even when the algorithm doesn't converge:

modeling.formula <- a~.
dataset <- test.df
outfile <- NULL
if (is.null(outfile) == FALSE){
  sink(file = outfile,
       append = TRUE, type = "output")
  print("")
  print("Models run at:")
  print(Sys.time())
}
  model.initial <- glm(modeling.formula,
                       family = binomial,
                       data = dataset)
  model.stepwise1 <- step(model.initial, direction = "backward")
  model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
  output <- list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)

Answer 1:

使用do.call来引用数据调用环境设置为我工作。 见https://stackoverflow.com/a/7668846/210673原始建议。 这里是可用的版本(与sink代码中删除)。

stepModel2 <- function(modeling.formula, dataset) {
  model.initial <- do.call("glm", list(modeling.formula,
                       family = "binomial",
                       data = as.name(dataset)))
  model.stepwise1 <- step(model.initial, direction = "backward")
  model.stepwise2 <- step(model.stepwise1, scope = ~.^2)
  list(modInitial = model.initial, modStep1 = model.stepwise1, modStep2 = model.stepwise2)
}

blah <- stepModel2(a~., dataset = "test.df")

它始终无法对我set.seed(6)与原来的代码。 失败的原因是, dataset变量不是内本step的功能,并且虽然它没有在制造需要model.stepwise1 ,它需要用于model.stepwise2model.stepwise1保持线性项。 这样的话,当你的版本会失败。 从调用全球环境的数据集,因为我在这里做修复此问题。



文章来源: Missing object error when using step() within a user-defined function
标签: r regression glm