Environments in R, mapply and get

2020-04-16 17:18发布

问题:

Let x<-2 in the global env:

x <-2 
x
[1] 2

Let a be a function that defines another x locally and uses get:

a<-function(){
  x<-1
  get("x")
}

This function correctly gets x from the local enviroment:

a()
[1] 1

Now let's define a function b as below, that uses mapply with get:

b<-function(){
  x<-1
  mapply(get,"x")
}

If I call b, it seems that mapply makes get not search the function environment first. Instead, it tries to get x directly form the global enviroment, and if x is not defined in the global env, it gives an error message:

b()
x 
2 
rm(x)
b()
Error in (function (x, pos = -1L, envir = as.environment(pos), mode = "any",  : 
  object 'x' not found 

The solution to this is to explicitly define envir=environment().

c<-function(){
  x<-1
  mapply(get,"x", MoreArgs = list(envir=environment()))
}

c()
x 
1 

But I would like to know what exactly is going on here. What is mapplydoing? (And why? is this the expected behavior?) Is this "pitfall" common in other R functions?

回答1:

The problem is that get looks into the envivornment that its called from but here we are passing get to mapply and then calling get from the local environment within mapply. If x is not found within the mapply local environment then it looks the into the parent environment of that, i.e. into environment(mapply) (which is the lexical environment that mapply was defined in which is the base namespace environment); if it is not there either, it looks into the parent of that, which is the global environment, i.e. your R workspace.

This is because R uses lexical scoping, as opposed to dynamic scoping.

We can show this by getting a variable that exists within mapply.

 x <- 2
 b2<-function(){
   x<-1
   mapply(get, "USE.NAMES")
 }
 b2() # it finds USE.NAMES in mapply
 ## USE.NAMES 
 ##     TRUE 

In addition to the workaround involving MoreArgs shown in the question this also works since it causes the search to look into the local environment within b after failing to find it mapply. (This is just for illustrating what is going on and in actual practice we would prefer the workaround shown in the question.)

x <- 2
b3 <-function(){
   x<-1
   environment(mapply) <- environment()
   mapply(get, "x")
}
b3()
## 1

ADDED Expanded explanation. Also note that we can view the chain of environments like this:

> debug(get)
> b()
debugging in: (function (x, pos = -1L, envir = as.environment(pos), mode = "any", 
    inherits = TRUE) 
.Internal(get(x, envir, mode, inherits)))(dots[[1L]][[1L]])
debug: .Internal(get(x, envir, mode, inherits))
Browse[2]> envir
<environment: 0x0000000021ada818>
Browse[2]> ls(envir) ### this shows that envir is the local env in mapply
[1] "dots"      "FUN"       "MoreArgs"  "SIMPLIFY"  "USE.NAMES"
Browse[2]> parent.env(envir) ### the parent of envir is the base namespace env
<environment: namespace:base>
Browse[2]> parent.env(parent.env(envir)) ### and grandparent of envir is the global env
<environment: R_GlobalEnv>

Thus, the ancestory of environments potentially followed is this (where arrow points to parent):

local environment within mapply --> environment(mapply) --> .GlobalEnv

where environment(mapply) equals asNamespace("base"), the base namespace environment.



回答2:

R is lexically scoped, not dynamically scoped, meaning that when you search through parent environments to find a value, you are searching through the lexical parents (as written in the source code), not through the dynamic parents (as invoked). Consider this example:

x <- "Global!"
fun1 <- function() print(x)
fun2 <- function() {
  x <- "Local!"
  fun1a <- function() print(x)
  fun1()           # fun2() is dynamic but not lexical parent of fun1()
  fun1a()          # fun2() is both dynamic and lexical parent of fun1a() 
}
fun2()

outputs:

[1] "Global!"
[1] "Local!"

In this case fun2 is the lexical parent of fun1a, but not of fun1. Since mapply is not defined inside your functions, your functions are not the lexical parents of mapply and the xs defined therein are not directly accessible to mapply.



回答3:

The issue is an interplay with built-in C code. Namely, considering the following:

fx <- function(x) environment()
env <- NULL; fn <- function() { env <<- environment(); mapply(fx, 1)[[1]] }

Then

env2 <- fn()
identical(env2, env)
# [1] FALSE
identical(parent.env(env2), env)
# [1] FALSE
identical(parent.env(env2), globalenv())
# [1] TRUE

More specifically, the problem lies in the underlying C code, which fails to consider executing environment, and hands it off to an as-is underlying C eval call which creates a temp environment branching directly off of R_GlobalEnv.

Note this really is what is going on, since no level of stack nesting fixes the issue:

env <- NULL; fn2 <- function() { env <<- environment(); (function() { mapply(fx, 1)[[1]] })() }
identical(parent.env(fn2()), globalenv())
# [1] TRUE