Writing functions in R, keeping scoping in mind

2020-02-08 03:18发布

问题:

I often write functions that need to see other objects in my environment. For example:

> a <- 3
> b <- 3
> x <- 1:5
> fn1 <- function(x,a,b) a+b+x
> fn2 <- function(x) a+b+x
> fn1(x,a,b)
[1]  7  8  9 10 11
> fn2(x)
[1]  7  8  9 10 11

As expected, both these functions are identical because fn2 can "see" a and b when it executes. But whenever I start to take advantage of this, within about 30 minutes I end up calling the function without one of the necessary variables (e.g. a or b). If I don't take advantage of this, then I feel like I am passing around objects unnecessarily.

Is it better to be explicit about what a function requires? Or should this be taken care of via inline comments or other documentation of the function? Is there a better way?

回答1:

If I know that I'm going to need a function parametrized by some values and called repeatedly, I avoid globals by using a closure:

make.fn2 <- function(a, b) {
    fn2 <- function(x) {
        return( x + a + b )
    }
    return( fn2 )
}

a <- 2; b <- 3
fn2.1 <- make.fn2(a, b)
fn2.1(3)    # 8
fn2.1(4)    # 9

a <- 4
fn2.2 <- make.fn2(a, b)
fn2.2(3)    # 10
fn2.1(3)    # 8

This neatly avoids referencing global variables, instead using the enclosing environment of the function for a and b. Modification of globals a and b doesn't lead to unintended side effects when fn2 instances are called.



回答2:

There's a reason that some languages don't allow global variables: they can easily lead to broken code.

The scoping rules in R allow you to write code in a lazy fashion - letting functions use variables in other environments can save you some typing, and it's great for playing around in simple cases.

If you are doing anything remotely complicated however, then I recommend that you pass a function all the variables that it needs (or at the very least, have some thorough sanity checking in place to have a fallback in case the variables don't exist).

In the example above:

The best practise is to use fn1.

Alternatively, try something like

 fn3 <- function(x)
   {
      if(!exists("a", envir=.GlobalEnv))
      {
         warning("Variable 'a' does not exist in the global environment")
         a <- 1
      }

      if(!exists("b", envir=.GlobalEnv))
      {
         warning("Variable 'b' does not exist in the global environment")
         b <- 2
      }

      x + a + b
   }


回答3:

Does the problem come about when you're just using a global variable in a function or when you try to assign the variable? If it's the latter I suspect it's because you're not using <<- as an assignment within the function. And while using <<- appears to be the dark side 1 it may very well work for your purposes. If it is the former, the function is probably masking the global variable.

Naming global variables in a manner that it would be difficult to mask them locally might help. e.g.: global.pimultiples <- 1:4*pi



回答4:

Usage of global variables is general discouraged in most languages, and R is no exception. Very often short function use short and generic variable names, which could be populated in the global environment. It is safest to a) include all the variables in the function definition b) not to assign default values. E.g., write f=function(a,b), rather f=function(a=0,b=NA).



标签: r scope