How can I keep NA when I change levels

2020-04-15 11:41发布

I build a vector of factors containing NA.

my_vec <- factor(c(NA,"a","b"),exclude=NULL)
levels(my_vec)
# [1] "a" "b" NA 

I change one of those levels.

levels(my_vec)[levels(my_vec) == "b"] <- "c"

NA disappears.

levels(my_vec)
# [1] "a" "c"

How can I keep it ?


EDIT

@rawr gave a nice solution that can work most of the time, it works for my previous specific example, but not for the one I'll show below @Hack-R had a pragmatic option using addNA, I could make it work with that but I'd rather a fully general solution

See this generalized issue

my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a"  NA   "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
[1] "a" "c"      # NA disppeared

@rawr's solution:

my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a"  NA   "b1" "b2"
attr(my_vec, 'levels')[levels(my_vec) %in% c("b1","b2")] <- "c"
levels(my_vec)
droplevels(my_vec)
[1] "a" NA  "c" "c" # c is duplicated

@Hack-R's solution:

my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
levels(my_vec)
[1] "a"  NA   "b1" "b2"
levels(my_vec)[levels(my_vec) %in% c("b1","b2")] <- "c"
my_vec <- addNA(my_vec)
levels(my_vec)
[1] "a" "c" NA     # NA is in the end

I want levels(my_vec) == c("a",NA,"c")

2条回答
Deceive 欺骗
2楼-- · 2020-04-15 11:52

You have to quote NA, otherwise R treats it as a null value rather than a factor level. Factor levels sort alphabetically by default, but obviously that's not always useful, so you can specify a different order by passing a new list order to levels()

require(plyr)
my_vec <- factor(c("NA","a","b1","b2"))
vec2 <- revalue(my_vec,c("b1"="c","b2"="c"))

#now reorder levels

my_vec2 <- factor(vec2, levels(vec2)[c(1,3,2)])

Levels: a NA c
查看更多
家丑人穷心不美
3楼-- · 2020-04-15 11:53

I finally created a function that first replaces the NA value with a temp one (inspired by @lmo), then does the replacement I wanted the standard way, then puts NA back in its place using @rawr's suggestion.

my_vec <- factor(c(NA,"a","b1","b2"),levels = c("a",NA,"b1","b2"),exclude=NULL)
my_vec <- level_sub(my_vec,c("b1","b2"),"c")
my_vec
# 1] <NA> a    c    c   
# Levels: a <NA> c

As a bonus level_sub can be used with na_rep = NULL which will remove the NA, and it will look good in pipe chains :).

level_sub <- function(x,from,to,na_rep = "NA"){
  if(!is.null(na_rep)) {levels(x)[is.na(levels(x))] <- na_rep}
  levels(x)[levels(x) %in% from] <- to
  if(!is.null(na_rep)) {attr(x, 'levels')[levels(x) == na_rep] <- NA}
  x
}

Nevertheless it seems that R really doesn't want you to add NA to factors.

levels(my_vec) <- c(NA,"a") will have a strange behavior but that doesn't stop here. While subset will keep NA levels in your columns, rbind will quietly remove them! I wouldn't be surprised if further investigation revealed that half R functions remove NA factors, making them very unsafe to work with...

查看更多
登录 后发表回答