This question already has an answer here:
-
Subset data frame based on number of rows per group
2 answers
I have a data frame df, and I am trying to subset all rows that have a value in column B
occur more than once in the dataset.
I tried using table to do it, but am having trouble subsetting from the table:
t<-table(df$B)
Then I try subsetting it using:
subset(df, table(df$B)>1)
And I get the error
"Error in x[subset & !is.na(subset)] :
object of type 'closure' is not subsettable"
How can I subset my data frame using table counts?
Here is a dplyr
solution (using mrFlick's data.frame)
library(dplyr)
newd <- dd %>% group_by(b) %>% filter(n()>1) #
newd
# a b
# 1 1 1
# 2 2 1
# 3 5 4
# 4 6 4
# 5 7 4
# 6 9 6
# 7 10 6
Or, using data.table
setDT(dd)[,if(.N >1) .SD,by=b]
Or using base R
dd[dd$b %in% unique(dd$b[duplicated(dd$b)]),]
May I suggest an alternative, faster way to do this with data.table
?
require(data.table) ## 1.9.2
setDT(df)[, .N, by=B][N > 1L]$B
(or) you can couple .I
(another special variable - see ?data.table
) which gives the corresponding row number in df
, along with .N
as follows:
setDT(df)[df[, .I[.N > 1L], by=B]$V1]
(or) have a look at @mnel's another for another variation (using yet another special variable .SD
).
Using table()
isn't the best because then you have to rejoin it to the original rows of the data.frame. The ave
function makes it easier to calculate row-level values for different groups. For example
dd<-data.frame(
a=1:10,
b=c(1,1,2,3,4,4,4,5,6, 6)
)
dd[with(dd, ave(b,b,FUN=length))>1, ]
#subset(dd, ave(b,b,FUN=length)>1) #same thing
a b
1 1 1
2 2 1
5 5 4
6 6 4
7 7 4
9 9 6
10 10 6
Here, for each level of b
, it counts the length of b
, which is really just the number of b
's and returns that back to the appropriate row for each value. Then we use that to subset.