This has really challenged my ability to debug R code.
I want to use ddply()
to apply the same functions to different columns that are sequentially named; eg. a, b, c. To do this I intend to repeatedly pass the column name as a string and use the eval(parse(text=ColName))
to allow the function to reference it. I grabbed this technique from another answer.
And this works well, until I put ddply()
inside another function. Here is the sample code:
# Required packages:
library(plyr)
myFunction <- function(x, y){
NewColName = "a"
z = ddply(x, y, summarize,
Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
)
return(z)
}
a = c(1,2,3,4)
b = c(0,0,1,1)
c = c(5,6,7,8)
df = data.frame(a,b,c)
sv = c("b")
#This works.
ColName = "a"
ddply(df, sv, summarize,
Ave = mean(eval(parse(text=ColName)), na.rm=TRUE)
)
#This doesn't work
#Produces error: "Error in parse(text = NewColName) : object 'NewColName' not found"
myFunction(df,sv)
#Output in both cases should be
# b Ave
#1 0 1.5
#2 1 3.5
Any ideas? NewColName is even defined inside the function!
I thought the answer to this question, loops-to-create-new-variables-in-ddply, might help me but I've done enough head banging for today and it's time to raise my hand and ask for help.
Looks like you have an environment problem. Global assignment fixes the problem, but at the cost of one's soul:
eval
is looking in parent.frame(1). So if you instead define NewColName outside MyFunction it should work:By using
get
to pull out my.parse from the earlier environment, we can come much closer, but still have to pass curenv as a global:I suspect that
ddply
is evaluating in the .GlobalEnv already, which is why all of theparent.frame()
andsys.frame()
strategies I tried failed.I occasionally run into problems like this when combining
ddply
withsummarize
ortransform
or something and, not being smart enough to divine the ins and outs of navigating various environments I tend to side-step the issue by simply not usingsummarize
and instead using my own anonymous function:Obviously, there is a cost to doing this stuff 'manually', but it often avoids the headache of dealing with the evaluation issues that come from combining
ddply
andsummarize
. That's not to say, of course, that Hadley won't show up with a solution...You can do this with a combination of
do.call
andcall
to construct the call in an environment whereNewColName
is still visible:Today's solution to this question is to make
summarize
intohere(summarize)
. e.g.here(f)
, added to plyr in Dec 2012, captures the current context.The problem lies in the code of the plyr package itself. In the summarize function, there is a line
eval(substitute(...),.data,parent.frame())
. It is well known that parent.frame() can do pretty funky and unexpected stuff. The solution of @James is a very nice workaround,
but if I remember right @Hadley himself said before that the plyr package was not intended to be used within functions.Sorry, I was wrong here. It is known though that for the moment, the plyr package gives problems in these situations.
Hence, I give you a base solution for the problem :