Let I have such data frame(df1) with factors:
factor1 factor2 factor3
------- ------- -------
d a x
d a x
b a x
b c x
b c y
c c y
c n y
c n y
c n y
I want to drop factors from this data frame which one of elements have less than 3 observations.
In this data frame factor1 has 3 levels(d,b and c). However d level has frequency 2. So I want to drop factor1 from this data frame.
Resulted data frame should be as:
factor2 factor3
------- -------
a x
a x
a x
c x
c y
c y
n y
n y
n y
How can I do this using R? I will be very glad for any help. Thanks a lot.
You could try using lapply
and table
:
df1[, lapply(c(1,2,3), FUN = function(x) min(table(df1[,x]))) >= 3]
and, a little more generic:
df1[, lapply(1:ncol(df1), FUN = function(x) min(table(df1[,x]))) >= 3]
is that what you want?
df <- data.frame(col1=rep(letters[1:4], each=3),
col2=rep(letters[1:2], each=6),
col3=rep(letters[1:3], each=4))
ddf[, sapply(df, function(x) min(nlevels(x)) > 2)]
We could use Filter
Filter(function(x) min(nlevels(x))>2, df1)
(based on the results in one of the upvoted posts)
Or it could be also
Filter(function(x) min(tabulate(x))>2, df1)
I would create a quick helper function that checks how many unique instances of each level exist with a quick call to table()
-- look at table(df$fac1)
to see how this works. Note this isn't very robust, but should get you started:
df <- data.frame(fac1 = factor(c("d", "d", "b", "b", "b", "c", "c", "c", "c")),
fac2 = factor(c("a", "a", "a", "c", "c", "c", "n", "n", "n")),
fac3 = factor(c(rep("x", 4), rep("y", 5))),
other = 1:9)
at_least_three_instances <- function(column) {
if (is.factor(column)) {
if (min(table(column)) > 2) {
return(TRUE)
} else {
return(FALSE)
}
} else {
return(TRUE)
}
}
df[unlist(lapply(df, at_least_three_instances))]