I wrote a wrapper around ftable
because I need to compute flat tables with frequency and percentage for many variables. As ftable
method for class "formula" uses non-standard evaluation, the wrapper relies on do.call
and match.call
to allow the use of the subset
argument of ftable
(more details in my previous question).
mytable <- function(...) {
do.call(what = ftable,
args = as.list(x = match.call()[-1]))
# etc
}
However, I cannot use this wrapper with lapply
nor with
:
# example 1: error with "lapply"
lapply(X = warpbreaks[c("breaks",
"wool",
"tension")],
FUN = mytable,
row.vars = 1)
Error in (function (x, ...) : object 'X' not found
# example 2: error with "with"
with(data = warpbreaks[warpbreaks$tension == "L", ],
expr = mytable(wool))
Error in (function (x, ...) : object 'wool' not found
These errors seem to be due to match.call
not being evaluated in the right environment.
As this question is closely linked to my previous one, here is a sum up of my problems:
- The wrapper with
do.call
and match.call
cannot be used with lapply
or with
.
- The wrapper without
do.call
and match.call
cannot use the subset
argument of ftable
.
And a sum up of my questions:
- How can I write a wrapper which allows both to use the
subset
argument of ftable
and to be used with lapply
and with
? I have ideas to avoid the use of lapply
and with
, but I am looking to understand and correct these errors to improve my knowledge of R.
- Is the error with
lapply
related to the following note from ?lapply
?
For historical reasons, the calls created by lapply are unevaluated,
and code has been written (e.g., bquote) that relies on this. This
means that the recorded call is always of the form FUN(X[[i]], ...),
with i replaced by the current (integer or double) index. This is not
normally a problem, but it can be if FUN uses sys.call or match.call
or if it is a primitive function that makes use of the call. This
means that it is often safer to call primitive functions with a
wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is
required to ensure that method dispatch for is.numeric occurs
correctly.
The problem with using match.call
with lapply
is that match.call
returns the literal call that passed into it, without any interpretation. To see what's going on, let's make a simpler function which shows exactly how your function is interpreting the arguments passed into it:
match_call_fun <- function(...) {
call = as.list(match.call()[-1])
print(call)
}
When we call it directly, match.call
correctly gets the arguments and puts them in a list that we can use with do.call
:
match_call_fun(iris['Species'], 9)
[[1]]
iris["Species"]
[[2]]
[1] 9
But watch what happens when we use lapply
(I've only included the output of the internal print
statement):
lapply('Species', function(x) match_call_fun(iris[x], 9))
[[1]]
iris[x]
[[2]]
[1] 9
Since match.call
gets the literal arguments passed to it, it receives iris[x]
, not the properly interpreted iris['Species']
that we want. When we pass those arguments into ftable
with do.call
, it looks for an object x
in the current environment, and then returns an error when it can't find it. We need to interpret
As you've seen, adding envir = parent.frame()
fixes the problem. This is because, adding that argument tells do.call
to evaluate iris[x]
in the parent frame, which is the anonymous function in lapply
where x
has it's proper meaning. To see this in action, let's make another simple function that uses do.call
to print ls
from 3 different environmental levels:
z <- function(...) {
print(do.call(ls, list()))
print(do.call(ls, list(), envir = parent.frame()))
print(do.call(ls, list(), envir = parent.frame(2)))
}
When we call z()
from the global environment, we see the empty environment inside the function, then the Global Environment:
z()
character(0) # Interior function environment
[1] "match_call_fun" "y" "z" # GlobalEnv
[1] "match_call_fun" "y" "z" # GlobalEnv
But when we call from within lapply
, we see that one level of parent.frame
up is the anonymous function in lapply
:
lapply(1, z)
character(0) # Interior function environment
[1] "FUN" "i" "X" # lapply
[1] "match_call_fun" "y" "z" # GlobalEnv
So, by adding envir = parent.frame()
, do.call
knows to evaluate iris[x]
in the lapply
environment where it knows that x
is actually 'Species'
, and it evaluates correctly.
mytable_envir <- function(...) {
tab <- do.call(what = ftable,
args = as.list(match.call()[-1]),
envir = parent.frame())
prop <- prop.table(x = tab,
margin = 2) * 100
bind <- cbind(as.matrix(x = tab),
as.matrix(x = prop))
margin <- addmargins(A = bind,
margin = 1)
round(x = margin,
digits = 1)
}
# This works!
lapply(X = c("breaks","wool","tension"),
FUN = function(x) mytable_envir(warpbreaks[x],row.vars = 1))
As for why adding envir = parent.frame()
makes a difference since that appears to be the default option. I'm not 100% sure, but my guess is that when the default argument is used, parent.frame
is evaluated inside the do.call
function, returning the environment in which do.call
is run. What we're doing, however, is calling parent.frame
outside do.call
, which means it returns one level higher than the default version.
Here's a test function that takes parent.frame()
as a default value:
fun <- function(y=parent.frame()) {
print(y)
print(parent.frame())
print(parent.frame(2))
print(parent.frame(3))
}
Now look at what happens when we call it from within lapply
both with and without passing in parent.frame()
as an argument:
lapply(1, function(y) fun())
<environment: 0x12c5bc1b0> # y argument
<environment: 0x12c5bc1b0> # parent.frame called inside
<environment: 0x12c5bc760> # 1 level up = lapply
<environment: R_GlobalEnv> # 2 levels up = globalEnv
lapply(1, function(y) fun(y = parent.frame()))
<environment: 0x104931358> # y argument
<environment: 0x104930da8> # parent.frame called inside
<environment: 0x104931358> # 1 level up = lapply
<environment: R_GlobalEnv> # 2 levels up = globalEnv
In the first example, the value of y
is the same as what you get when you call parent.frame()
inside the function. In the second example, the value of y
is the same as the environment one level up (inside lapply
). So, while they look the same, they're actually doing different things: in the first example, parent.frame
is being evaluated inside the function when it sees that there is no y=
argument, in the second, parent.frame
is evaluated in the lapply
anonymous function first, before calling fun
, and then is passed into it.
As you only want to pass all the arguments passed to ftable u do not need the do.call().
mytable <- function(...) {
tab <- ftable(...)
prop <- prop.table(x = tab,
margin = 2) * 100
bind <- cbind(as.matrix(x = tab),
as.matrix(x = prop))
margin <- addmargins(A = bind,
margin = 1)
return(round(x = margin,
digits = 1))
}
The following lapply creates a table for every Variable separatly i don't know if that is what you want.
lapply(X = c("breaks",
"wool",
"tension"),
FUN = function(x) mytable(warpbreaks[x],
row.vars = 1))
If you want all 3 variables in 1 table
warpbreaks$newVar <- LETTERS[3:4]
lapply(X = cbind("c(\"breaks\", \"wool\", \"tension\")",
"c(\"newVar\", \"tension\",\"wool\")"),
FUN = function(X)
eval(parse(text=paste("mytable(warpbreaks[,",X,"],
row.vars = 1)")))
)
Thanks to this issue, the wrapper became:
# function 1
mytable <- function(...) {
do.call(what = ftable,
args = as.list(x = match.call()[-1]),
envir = parent.frame())
# etc
}
Or:
# function 2
mytable <- function(...) {
mc <- match.call()
mc[[1]] <- quote(expr = ftable)
eval.parent(expr = mc)
# etc
}
I can now use the subset
argument of ftable
, and use the wrapper in lapply
:
lapply(X = warpbreaks[c("wool",
"tension")],
FUN = function(x) mytable(formula = x ~ breaks,
data = warpbreaks,
subset = breaks < 15))
However I do not understand why I have to supply envir = parent.frame()
to do.call
as it is a default argument.
More importantly, these methods do not resolve another issue: I can not use the subset
argument of ftable
with mapply.