I use ddply to summarize some data.frame
by various categories, like this:
# with both group and size being factors / categorical
split.df <- ddply(mydata,.(group,size),summarize,
sumGroupSize = sum(someValue))
This works smoothly, but often I like to calculate ratios which implies that I need to divide by the group's total. How can I calculate such a total within the same ddply
call?
Let's say I'd like to have the share of observations in group A that are in size class 1. Obviously I have to calculate the sum of all observations in size class 1 first. Sure I could do this with two ddply calls, but using all one call would be more comfortable. Is there a way to do so?
EDIT: I did not mean to ask overly specific, but I realize I was disturbing people here. So here's my specific problem. In fact I do have an example that works, but I don't consider it really nifty. Plus it has a shortcoming that I need to overcome: it does not work correctly with apply.
library(plyr)
# make the dataset more "realistic"
mydata <- warpbreaks
names(mydata) <- c("someValue","group","size")
mydata$category <- c(1,2,3)
mydata$categoryA <- c("A","A","X","X","Z","Z")
# add some NA
mydata$category[c(8,10,19)] <- NA
mydata$categoryA[c(14,1,20)] <- NA
# someValue is summarized !
# note we have a another, varying category hence we need the a parameter
calcShares <- function(a, data) {
# !is.na needs to be specific!
tempres1 <- eval(substitute(ddply(data[!is.na(a),],.(group,size,a),summarize,
sumTest = sum(someValue,na.rm=T))),
envir=data, enclos=parent.frame())
tempres2 <- eval(substitute(ddply(data[!is.na(a),],.(group,size),summarize,
sumTestTotal = sum(someValue,na.rm=T))),
envir=data, enclos=parent.frame())
res <- merge(tempres1,tempres2,by=c("group","size"))
res$share <- res$sumTest/res$sumTestTotal
return(res)
}
test <- calcShares(category,mydata)
test2 <- calcShares(categoryA,mydata)
head(test)
head(test2)
As you can see I intend to run this over different categorical variables. In the example I have only two (category, categoryA) but in fact I got more, so using apply with my function would be really nice, but somehow it does not work correctly.
applytest <- head(apply(mydata[grep("^cat",
names(mydata),value=T)],2,calcShares,data=mydata))
.. returns a warning message and a strange name (newX[, i] ) for the category var.
So how can I do THIS a) more elegantly and b) fix the apply issue?