Using ddply inside a function

2019-02-10 17:12发布

问题:

I'm trying to make a function using ddply inside of it. However I can't get to work. This is a dummy example reproducing what I get. Does this have anything to do this bug?

library(ggplot2)
data(diamonds)

foo <- function(data, fac1, fac2, bar) {
  res <- ddply(data, .(fac1, fac2), mean(bar))
  res
}

foo(diamonds, "color", "cut", "price")

回答1:

I don't believe this is a bug. ddply expects the name of a function, which you haven't really supplied with mean(bar). You need to write a complete function that calculates the mean you'd like:

foo <- function(data, fac1, fac2, bar) {
  res <- ddply(data, c(fac1, fac2), function(x,ind){
                                     mean(x[,ind]},bar)
  res
}

Also, you shouldn't pass strings to .(), so I changed that to c(), so that you can pass the function arguments directly to ddply.



回答2:

There are quite a few things wrong with your code, but the main issue is: you are passing column names as character strings.

Just doing a 'find-and-replace' with your parameters within the function yields:

res <- ddply(diamonds, .("color", "cut"), mean("price"))

If you understand how ddply works (I kind of doubt this, given the rest of the code), you will understand that this is not supposed to work: ignoring the error in the last part (the function), this should be (notice the lack of quotes: the .() notation is nothing more than plyr's way of providing the quotes):

res <- ddply(diamonds, .(color, cut), mean(price))

Fortunately, ddplyalso supports passing its second argument as a vector of characters, i.e. the names of the columns, so (once again disregarding issues with the last parameter), this should become:

foo <- function(data, facs, bar) {
  res <- ddply(data, facs, mean(bar))
  res
}

foo(diamonds, c("color", "cut"), "price")

Finally: the function you pass to ddply should be a function that takes as its first argument a data.frame, which will each time hold the part of you passed along data.frame (diamonds) for the current values of color and cut. mean("price") or mean(price) are neither. If you insist on using ddply, here's what you need to do:

foo <- function(data, facs, bar) {
  res <- ddply(data, facs, function(dfr, colnm){mean(dfr[,colnm])}, bar)
  res
}
foo(diamonds, c("color", "cut"), "price")


标签: r plyr