I often have data that is grouped by one or more variables, with several registrations within each group. From the data frame, I wish to select groups according to various criteria.
I commonly use a split-sapply-rbind approach, where I extract elements from a list using a logical vector.
Here is a small example. I start with a data frame with one grouping variable ('group'), and I wish to select groups that have a maximum mass of less than 45:
dd <- data.frame(group = rep(letters[1:3], each = 5),
mass = c(rnorm(5, 30), rnorm(5, 50),
rnorm(5, 40)))
dd2 <- split(x = dd, f = dd$group)
dd3 <- dd2[sapply(dd2, function(x) max(x$mass) < 45)]
dd4 <- do.call(rbind, dd3)
I have just started to use plyr, and now I wonder:
is there a plyr-only alternative to achieve this?
At least in this situation this gives the same result
library(plyr)
dd5 <- ddply(dd,.(group),function(x) x[max(x$mass)<45,])
all(dd4==dd5)
[1] TRUE
Here is a data.table solution for coding elegance
library(data.table)
DT <- data.table(dd)
DT[,if(max(mass) < 45){.SD},by=group]
group mass
1: a 28.80426
2: a 31.31232
3: a 29.47599
4: a 30.35425
5: a 29.92833
6: c 40.11349
7: c 40.17431
8: c 39.94652
9: c 39.57524
10: c 40.20791
Perhaps slightly more convoluted
new <- (DT[,index := max(mass) < 45,by=group][force(index)])[,index:=NULL]
I realize that you have specifically asked for a plyr
solution, but I thought I would also share an alternative way to do this in base R that does not involve your multi-step approach:
dd[as.logical(ave(dd$mass, dd$group, FUN = function(x) max(x) < 45)), ]
The ave
function is usually handy when dealing with groups in R. Here, I've created a logical vector, and subsetted based on the indices of the "TRUE
" values.