可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This has really challenged my ability to debug R code.
I want to use ddply()
to apply the same functions to different columns that are sequentially named; eg. a, b, c. To do this I intend to repeatedly pass the column name as a string and use the eval(parse(text=ColName))
to allow the function to reference it. I grabbed this technique from another answer.
And this works well, until I put ddply()
inside another function. Here is the sample code:
# Required packages:
library(plyr)
myFunction <- function(x, y){
NewColName = "a"
z = ddply(x, y, summarize,
Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
)
return(z)
}
a = c(1,2,3,4)
b = c(0,0,1,1)
c = c(5,6,7,8)
df = data.frame(a,b,c)
sv = c("b")
#This works.
ColName = "a"
ddply(df, sv, summarize,
Ave = mean(eval(parse(text=ColName)), na.rm=TRUE)
)
#This doesn't work
#Produces error: "Error in parse(text = NewColName) : object 'NewColName' not found"
myFunction(df,sv)
#Output in both cases should be
# b Ave
#1 0 1.5
#2 1 3.5
Any ideas? NewColName is even defined inside the function!
I thought the answer to this question, loops-to-create-new-variables-in-ddply, might help me but I've done enough head banging for today and it's time to raise my hand and ask for help.
回答1:
You can do this with a combination of do.call
and call
to construct the call in an environment where NewColName
is still visible:
myFunction <- function(x,y){
NewColName <- "a"
z <- do.call("ddply",list(x, y, summarize, Ave = call("mean",as.symbol(NewColName),na.rm=TRUE)))
return(z)
}
myFunction(d.f,sv)
b Ave
1 0 1.5
2 1 3.5
回答2:
Today's solution to this question is to make summarize
into here(summarize)
. e.g.
myFunction <- function(x, y){
NewColName = "a"
z = ddply(x, y, here(summarize),
Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
)
return(z)
}
here(f)
, added to plyr in Dec 2012, captures the current context.
回答3:
I occasionally run into problems like this when combining ddply
with summarize
or transform
or something and, not being smart enough to divine the ins and outs of navigating various environments I tend to side-step the issue by simply not using summarize
and instead using my own anonymous function:
myFunction <- function(x, y){
NewColName <- "a"
z <- ddply(x, y, .fun = function(xx,col){
c(Ave = mean(xx[,col],na.rm=TRUE))},
NewColName)
return(z)
}
myFunction(df,sv)
Obviously, there is a cost to doing this stuff 'manually', but it often avoids the headache of dealing with the evaluation issues that come from combining ddply
and summarize
. That's not to say, of course, that Hadley won't show up with a solution...
回答4:
The problem lies in the code of the plyr package itself. In the summarize function, there is a line eval(substitute(...),.data,parent.frame())
. It is well known that parent.frame() can do pretty funky and unexpected stuff. T
he solution of @James is a very nice workaround, but if I remember right @Hadley himself said before that the plyr package was not intended to be used within functions.
Sorry, I was wrong here. It is known though that for the moment, the plyr package gives problems in these situations.
Hence, I give you a base solution for the problem :
myFunction <- function(x, y){
NewColName = "a"
z = aggregate(x[NewColName],x[y],mean,na.rm=TRUE)
return(z)
}
> myFunction(df,sv)
b a
1 0 1.5
2 1 3.5
回答5:
Looks like you have an environment problem. Global assignment fixes the problem, but at the cost of one's soul:
library(plyr)
a = c(1,2,3,4)
b = c(0,0,1,1)
c = c(5,6,7,8)
d.f = data.frame(a,b,c)
sv = c("b")
ColName = "a"
ddply(d.f, sv, summarize,
Ave = mean(eval(parse(text=ColName)), na.rm=TRUE)
)
myFunction <- function(x, y){
NewColName <<- "a"
z = ddply(x, y, summarize,
Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
)
return(z)
}
myFunction(x=d.f,y=sv)
eval
is looking in parent.frame(1). So if you instead define NewColName outside MyFunction it should work:
rm(NewColName)
NewColName <- "a"
myFunction <- function(x, y){
z = ddply(x, y, summarize,
Ave = mean(eval(parse(text=NewColName)), na.rm=TRUE)
)
return(z)
}
myFunction(x=d.f,y=sv)
By using get
to pull out my.parse from the earlier environment, we can come much closer, but still have to pass curenv as a global:
myFunction <- function(x, y){
NewColName <- "a"
my.parse <- parse(text=NewColName)
print(my.parse)
curenv <<- environment()
print(curenv)
z = ddply(x, y, summarize,
Ave = mean( eval( get("my.parse" , envir=curenv ) ), na.rm=TRUE)
)
return(z)
}
> myFunction(x=d.f,y=sv)
expression(a)
<environment: 0x0275a9b4>
b Ave
1 0 1.5
2 1 3.5
I suspect that ddply
is evaluating in the .GlobalEnv already, which is why all of the parent.frame()
and sys.frame()
strategies I tried failed.