I'm reading the AdvancedR by Hadley and am testing the following code on this URL
subset2 = function(df, condition){
condition_call = eval(substitute(condition),df )
df[condition_call,]
}
df = data.frame(a = 1:10, b = 2:11)
condition = 3
subset2(df, a < condition)
Then I got the following error message:
Error in eval(substitute(condition), df) : object 'a' not found
I read the explanation as follows but don't quite understand:
If eval() can’t find the variable inside the data frame (its second argument), it looks in the environment of subset2(). That’s obviously not what we want, so we need some way to tell eval() where to look if it can’t find the variables in the data frame.
In my opinion, while "eval(substitute(condition),df )", the variable they cannot find is condition, then why object "a" cannot be found?
On the other hand, why the following code won't make any error?
subset2 = function(df, condition){
condition_call = eval(substitute(condition),df )
df[condition_call,]
}
df = data.frame(a = 1:10, b = 2:11)
y = 3
subset2(df, a < y)
This more stripped down example may make it easier for you to see what's going on in Hadley's example. The first thing to note is that the symbol condition
appears here in four different roles, each of which I've marked with a numbered comment.
## Role of symbol `condition`
f <- function(condition) { #1 -- formal argument
a <- 100
condition + a #2 -- symbol bound to formal argument
}
condition <- 3 #3 -- symbol in global environment
f(condition = condition + a) #4 -- supplied argument (on RHS)
## Error in f(condition = condition + a) (from #1) : object 'a' not found
The other important thing to understand is that symbols in supplied arguments (here the right hand side part of condition = condition + a
at #4
) are searched for in the evaluation frame of the calling function. From Section 4.3.3 Argument Evaluation of the R Language Definition:
One of the most important things to know about the evaluation of arguments to a function is that supplied arguments and default arguments are treated differently. The supplied arguments to a function are evaluated in the evaluation frame of the calling function. The default arguments to a function are evaluated in the evaluation frame of the function.
In the example above, the evaluation frame of the call to f()
is the global environment, .GlobalEnv
.
Taking this step by step, here is what happens when you call (condition = condition + a)
. During function evaluation, R comes across the expression condition + a
in the function body (at #2
). It searches for values of a
and condition
, and finds a locally assigned symbol a
. It finds that the symbol condition
is bound to the formal argument named condition
(at #1
). The value of that formal argument, supplied during the function call, is condition + a
(at #4
).
As noted in the R Language Definition, the values of the symbols in the expression condition + a
are searched for in the environment of the calling function, here the global environment. Since the global environment contains a variable named condition
(assigned at #3
) but no variable named a
, it is unable to evaluate the expression condition + a
(at #4
), and fails with the error that you see.
I want to add some details in case someone stumbles on this question. The problematic line is
condition_call = eval(substitute(condition),df )
The condition object in substitute() function is a promise object, its expression slot is "a < condition" and substitute(condition) takes expression and returns a call object with expression as "a < condition".
Then eval() function start to evaluate the "a < condition" in the df environment. Its target is finding both a and condition.
- a is found in df successfully, and this is not where the bug generated.
- Then R starts searching condition in df and cannot find it.
- So R goes up to the execution environment of subset2, and finds condition in the execution environment.
- The variable it finds is actually the promise object mentioned before with expression slot as "a < condition".
- To evaluate this expression, R has to find a again, and now it cannot find a any more because it has passed the df environment. This is the part that really generates the error.
To summarize the problem here:
- R does find a in the df for once.
- The bug arises when R tries to look for condition and then R takes the promise object condition instead of the 4 assigned outside as the argument and tries to evaluate it.
- Then R runs into the problem:
- it tries to evaluate "a < condition" and it cannot find a either in the execution environment of subset2() or global environment.
For my second example, R cannot find y in the execution environment and then finds y in the calling environment of subset2() as 4, generating no errors. In this case, the name of y is different from the promise object condition and R won't try to evaluate "a < y" and no bugs generated.