Filtering a data frame in R and an unwanted filter

2019-03-01 04:42发布

问题:

This snippet:

names<-c("Alice","Bob","Charlie")
ages<-c(25,24,25)
friends<-data.frame(names,ages)
a25 <- friends[friends$age==25,]
a25
table(a25$names)

gives me this output

    names ages
1   Alice   25
3 Charlie   25

  Alice     Bob Charlie 
      1       0       1

Now, why "Bob" is in the output since the data frame a25 does not include "Bob"? I would expected an output like this (from the table command):

  Alice  Charlie 
      1        1 

What am I missing?

My environment:

R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)

回答1:

This question appears to have an answer in the comments. This answer shares one additional approach and consolidates the suggestions from the comments.

The problem you describe is as follows: There is no "Bob" in your "a25$names" variable, but when you use table, "Bob" shows up. This is because the levels present in the original column have been retained.

table(a25$names)
# 
#   Alice     Bob Charlie 
#       1       0       1 

Fortunately, there's a function called droplevels that takes care of situations like this:

table(droplevels(a25$names))
# 
#   Alice Charlie 
#       1       1 

The droplevels function can work on a data.frame too, allowing you to do the following:

a25alt <- droplevels(friends[friends$ages==25,])
a25alt
#     names ages
# 1   Alice   25
# 3 Charlie   25
table(a25alt$names)
# 
#   Alice Charlie 
#       1       1 

As mentioned in the comments, also look at as.character and factor:

table(as.character(a25$names))
table(factor(a25$names))