I build a vector of factors containing NA.
my_vec <- factor(c(NA,"a","b"),exclude=NULL)
levels(my_vec)
# [1] "a" "b" NA
I change one of those levels.
levels(my_vec)[levels(my_vec) == "b"] <- "c"
NA disappears.
levels(my_vec)
# [1] "a" "c"
How can I keep it ?
EDIT
@rawr gave a nice solution that can work most of the time, it works for my previous specific example, but not for the one I'll show below @Hack-R had a pragmatic option using addNA, I could make it work with that but I'd rather a fully general solution
See this generalized issue
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
[1] "a" "c" # NA disppeared
@rawr's solution:
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
attr(my_vec, 'levels')[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
droplevels(my_vec)
[1] "a" NA "c" "c" # c is duplicated
@Hack-R's solution:
my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a" NA "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
my_vec <- addNA(my_vec)
levels(my_vec)
[1] "a" "c" NA # NA is in the end
I want levels(my_vec) == c("a",NA,"c")
You have to quote NA, otherwise R treats it as a null value rather than a factor level. Factor levels sort alphabetically by default, but obviously that's not always useful, so you can specify a different order by passing a new list order to
levels()
I finally created a function that first replaces the
NA
value with a temp one (inspired by @lmo), then does the replacement I wanted the standard way, then putsNA
back in its place using @rawr's suggestion.As a bonus
level_sub
can be used withna_rep = NULL
which will remove theNA
, and it will look good in pipe chains :).Nevertheless it seems that R really doesn't want you to add NA to factors.
levels(my_vec) <- c(NA,"a")
will have a strange behavior but that doesn't stop here. Whilesubset
will keepNA
levels in your columns,rbind
will quietly remove them! I wouldn't be surprised if further investigation revealed that half R functions removeNA
factors, making them very unsafe to work with...