This snippet:
names<-c("Alice","Bob","Charlie")
ages<-c(25,24,25)
friends<-data.frame(names,ages)
a25 <- friends[friends$age==25,]
a25
table(a25$names)
gives me this output
names ages
1 Alice 25
3 Charlie 25
Alice Bob Charlie
1 0 1
Now, why "Bob" is in the output since the data frame a25
does not include "Bob"? I would expected an output like this (from the table
command):
Alice Charlie
1 1
What am I missing?
My environment:
R version 2.15.2 (2012-10-26)
Platform: i386-w64-mingw32/i386 (32-bit)
This question appears to have an answer in the comments. This answer shares one additional approach and consolidates the suggestions from the comments.
The problem you describe is as follows: There is no "Bob" in your "a25$names" variable, but when you use table
, "Bob" shows up. This is because the levels present in the original column have been retained.
table(a25$names)
#
# Alice Bob Charlie
# 1 0 1
Fortunately, there's a function called droplevels
that takes care of situations like this:
table(droplevels(a25$names))
#
# Alice Charlie
# 1 1
The droplevels
function can work on a data.frame
too, allowing you to do the following:
a25alt <- droplevels(friends[friends$ages==25,])
a25alt
# names ages
# 1 Alice 25
# 3 Charlie 25
table(a25alt$names)
#
# Alice Charlie
# 1 1
As mentioned in the comments, also look at as.character
and factor
:
table(as.character(a25$names))
table(factor(a25$names))