Lazy evaluation in R – is assign affected?

2019-03-17 23:40发布

问题:

I read this basic question on renaming objects and @Shane 's answer to it, pointing me to lazy evaluation. Now I wonder if assign is evaluated lazily, too. Just like here:

assign("someNewName",someOldObject)
rm(someOldObject)

The reason why I wonder about this is the following use case: Assume I got 10K+ R objects each of which has two attributes called originalName and additionalName. Now I want to write a function that can efficiently let the user switch from one name to the other without losing these two attributes. Roughly like this...

EDIT: based on @Hadley's input I have changed my code.

switchObjectName <- function(x) {
  n1 <- attributes(x)$originalName
  n2 <- attributes(x)$additionalName
  objName <- deparse(substitute(x))
  if(objName == n1) {
    delayedAssign(n2,x,assign.env=.GlobalEnv)
  } else {
    delayedAssign(n1,x,assign.env=.GlobalEnv)
  }
  rm(list=c(objName),envir=.GlobalEnv)    
}

this works well, but I had quite some trouble to get the rm statement right. I tried rm(objName,envir=.GlobalEnv) but could not get it to work though objName is definitely a character cause it is the result of deparse(substitute(x).

回答1:

The R language generally has value semantics. The assignment x <- y means that x and y will be independent copies of the same object (updates on y and x will be independent). A naive implementation of x <- y would always allocate memory for x and fully copy y into it. GNU-R instead uses a copy-on-write mechanism, it would postpone the copy until an update actually happens, which saves memory/execution time in case it does not happen. R users do not have to know about this optimization, it is fully transparent (except some rare cases like out-of-memory errors). This mechanism applies to assignment written as x <- y and assign("x", y) equally.

Lazy evaluation is part of the design of the language and is visible to R users/programmers. Expressions passed as arguments to a function, e.g. in foo(ls()) the expression passed is ls(), are evaluated lazily, only if and when needed by the implementation of the called function.

delayedAssign is a low-level function, visible to R users/programmers, but it really is only used for lazy loading of packages and should not be needed in user programs. delayedAssign allows to specify an expression to compute the value of a variable; the computation will happen lazily only if/when the variable is read the first time.

So, to answer the question, an assignment in R is always ''lazy'' in that the copy-on-write mechanism is used. The computation of the right-hand-side of the assignment can be also lazy (using delayedAssign), but that should not be needed/used by user programs.

I think for the ''renaming'' of variables, there is no need to use delayedAssign (because the right-hand-side is not computed). It only makes the situation more complex and there will likely be performance overhead due to the book-keeping delayedAssign has to do. I would just use ordinary assignment if I had to rename variables.

For code clarity, I would also whenever possible try to avoid deleting variables from environments and even assigning from a function into the global environment, e.g. I would just create a new list and insert the new bindings (variables) into it.

Having mentioned the copy-on-write mechanism, with the current implementation in GNU-R, any of the described solutions will potentially cause memory copying that would not be necessary had the variables not been renamed. There is no way to avoid this at R level.