I want to delete certain rows in a data frame when the number of rows with the same index is smaller than a pre-specified value.
> fof.6.5[1:15, 1:3]
draw Fund.ID Firm.ID
1 1 1667 666
2 1 1572 622
3 1 1392 553
4 1 248 80
5 1 3223 332
6 2 2959 1998
7 2 2659 1561
8 2 14233 2517
9 2 10521 12579
10 2 3742 1045
11 3 9093 10121
12 3 15681 21626
13 3 26371 70170
14 4 27633 52720
15 4 13751 656
In this example, I want each index to have 5 rows. The third draw (which is my index) has fewer than 5 rows. How can I delete the draws like the third one if they have fewer than 5 rows?
You could do this using dplyr
(assuming your data is in a data frame called dt
:
dt %>% group_by(draw) %>% filter(n() >= 5) %>% ungroup()
Or you could use table
or xtabs
:
tab <- xtabs(~ draw, dt)
dt[!dt$draw %in% as.numeric(names(which(tab < 5))), ]
Here is another option using data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
, grouped by 'draw', if
the nrows (.N
) are greater than or equal to 'n' then get the Subset of Data.table (.SD
)
library(data.table)
n <- 5
setDT(df1)[, if(.N >= n) .SD, by = draw]
# draw Fund.ID Firm.ID
# 1: 1 1667 666
# 2: 1 1572 622
# 3: 1 1392 553
# 4: 1 248 80
# 5: 1 3223 332
# 6: 2 2959 1998
# 7: 2 2659 1561
# 8: 2 14233 2517
# 9: 2 10521 12579
#10: 2 3742 1045
If we want only 'n' number of rows, use the head
setDT(df1)[, if(.N >= n) head(.SD, n), by = draw]
Code below makes that:
library(dplyr)
fof.6.5 <- data.frame(draw = rep(1:4, c(5, 5, 3, 2)),
Fun.ID = rnorm(15),
Firm.ID = rnorm(15))
fof_filter <- fof.6.5 %>% group_by(draw) %>% mutate(freq = n()) %>%
filter(freq >= 5) %>% select(-freq)
fof_filter
group_by
join (in someway) profiles with the same value of draw
mutate
add a new variable "freq" that give the number of profiles
per each group of draw and repeats it.
filter
selects only those
profiles with "freq" greater or equals to 5
select
delete the "freq"
column because is not needed anymore