Subset a list - a plyr way?

2020-06-18 09:52发布

问题:

I often have data that is grouped by one or more variables, with several registrations within each group. From the data frame, I wish to select groups according to various criteria.

I commonly use a split-sapply-rbind approach, where I extract elements from a list using a logical vector.

Here is a small example. I start with a data frame with one grouping variable ('group'), and I wish to select groups that have a maximum mass of less than 45:

dd <- data.frame(group = rep(letters[1:3], each = 5), 
                 mass = c(rnorm(5, 30), rnorm(5, 50), 
                          rnorm(5, 40)))
    dd2 <- split(x = dd, f = dd$group)
    dd3 <- dd2[sapply(dd2, function(x) max(x$mass) < 45)]
    dd4 <- do.call(rbind, dd3)

I have just started to use plyr, and now I wonder:
is there a plyr-only alternative to achieve this?

回答1:

At least in this situation this gives the same result

library(plyr)
dd5 <- ddply(dd,.(group),function(x) x[max(x$mass)<45,])

all(dd4==dd5)
[1] TRUE


回答2:

Here is a data.table solution for coding elegance

library(data.table)
DT <- data.table(dd)

DT[,if(max(mass) < 45){.SD},by=group]
    group     mass
 1:     a 28.80426
 2:     a 31.31232
 3:     a 29.47599
 4:     a 30.35425
 5:     a 29.92833
 6:     c 40.11349
 7:     c 40.17431
 8:     c 39.94652
 9:     c 39.57524
10:     c 40.20791

Perhaps slightly more convoluted

new <- (DT[,index := max(mass) < 45,by=group][force(index)])[,index:=NULL]


回答3:

I realize that you have specifically asked for a plyr solution, but I thought I would also share an alternative way to do this in base R that does not involve your multi-step approach:

dd[as.logical(ave(dd$mass, dd$group, FUN = function(x) max(x) < 45)), ]

The ave function is usually handy when dealing with groups in R. Here, I've created a logical vector, and subsetted based on the indices of the "TRUE" values.



标签: r plyr