-->

Force R function call to be self-sufficient

2019-07-22 03:04发布

问题:

I'm looking for a way to call a function that is not influenced by other objects in .GlobalEnv.

Take a look at the two functions below:

y = 3
f1 = function(x) x+y

f2 = function(x) {
   library(dplyr)
   x %>%
       mutate(area = Sepal.Length *Sepal.Width) %>%
       head()
}

In this case:

  • f1(5) should fail, because y is not defined in the function scope
  • f2(iris) should pass, because the function does not reference variables outside its scope

Now, I can overwrite the environment of f1 and f2, either to baseenv() or new.env(parent=environment(2L)):

environment(f1) = baseenv()
environment(f2) = baseenv()
f1(3)    # fails, as it should
f2(iris) # fails, because %>% is not in function env

or:

# detaching here makes `dplyr` inaccessible for `f2`
# not detaching leaves `head` inaccessible for `f2`
detach("package:dplyr", unload=TRUE)
environment(f1) = new.env(parent=as.environment(2L))
environment(f2) = new.env(parent=as.environment(2L))
f1(3)    # fails, as it should
f2(iris) # fails, because %>% is not in function env

Is there a way to overwrite a function's environment so that it has to be self-sufficient, but it also always works as long as it loads its own libraries?

回答1:

The problem here is, fundamentally, that library and similar tools don’t provide scoping, and are not designed to be made to work with scopes:1 Even though library is executed inside the function, its effect is actually global, not local. Ugh.

Specifically, your approach of isolating the function from the global environment is sounds; however, library manipulates the search path (via attach), and the function’s environment isn’t “notified” of this: it will still point to the previous second search path entry as its grandparent.

You need to find a way of updating the function environment’s grandparent environment when library/attach/… ist called. You could achieve this by replacing library etc. in the function’s parent environment with your own versions that calls a modified version of attach. This attach2 would then not only call the original attach but also relink your environment’s parent.


1 As an aside, ‹modules› fixes all of these problems. Replacing library(foo) by modules::import_package('foo', attach = TRUE) in your code makes it work. This is because modules are strongly scoped and environment-aware.