How can I drop unused levels from a data frame?

2019-03-08 07:22发布

Given the following mock data:

set.seed(123)
x <- data.frame(let = sample(letters[1:5], 100, replace = T), 
                num = sample(1:10, 100, replace = T))
y <- subset(x, let != 'a')

Creating a table of y$let yields

a  b  c  d  e 
0 20 21 22 18

But I don't want a to show anymore. If I try to do this:

levels(y$let) <- factor(y$let)

I mess the frequencies, since now table(y$let) gives me

b  d  c  e 
0 20 21 40 

I'm aware I could do xtabs(~ y$let, drop.unused.levels = T) and work around the problem, but it doesn't reset the variable levels at its core (which is important to me, since this is an early change I'm making to the dataset which will carry on throughout the whole analysis). Moreover, xtabs is a different class from table, which will give me headaches later in the project.

The question is: how can I automatically change levels(y$let) so it doesn't show levels that were dropped when I created the subset? In this case, how can I make it show [1] "b" "c" "d" "e"?

标签: r levels
3条回答
ら.Afraid
2楼-- · 2019-03-08 07:44

Adding to Hong Ooi's answer, here is an example I found from R-Bloggers.

# Create some fake data
x <- as.factor(sample(head(colors()),100,replace=TRUE))
levels(x)
x <- x[x!="aliceblue"]
levels(x) # still the same levels
table(x) # even though one level has 0 entries!

The solution is simple: run factor() again:
x <- factor(x)
levels(x)
查看更多
我只想做你的唯一
3楼-- · 2019-03-08 07:45

There's a recently added function in R for this:

y <- droplevels(y)
查看更多
来,给爷笑一个
4楼-- · 2019-03-08 07:56

Just do y$let <- factor(y$let). Running factor on an existing factor variable will reset the levels to only those that are present.

查看更多
登录 后发表回答