recoding variables in R with a lookup table

2019-04-16 16:14发布

问题:

I have a question about recoding data. I would like to use a lookup table and I am wondering how to recode NA and use an approach similar to %in%.

Sample data:

gender <- c("Female", "Not Disclosed", "Unknown" , "Male", "Male", "Female", NA)
df_gender <- as.data.frame(gender)
df_gender$gender <- as.character(gender)

My first approach to recode is:

df_gender$gender[df_gender$gender == "Female"] <- "F"
df_gender$gender[df_gender$gender == "Male"] <- "M"
df_gender$gender[df_gender$gender %in% c("Unknown", "Not Disclosed", NA)] <- "Missing"

This approach works appropriately. However, it is tedious when there are lots of variables and can lead to a lot of lines of code. I would like to use a lookup table such as the other approach I tried:

df_gender2 <- as.data.frame(gender)
df_gender2$gender <- as.character(gender)

gender_lookup <- c(Female = "F", Male = "M", Unknown = "Missing", "Not Disclosed" = "Missing")
df_gender2$gender <- gender_lookup[df_gender2$gender]

This works, but does not recode NA to missing. Is there a way to combine "Not Disclosed" and "Unknown" to set it equal to "Missing" without typing them separately? Second, using a lookup table, is there a way to also recode NA to "Missing"?

标签: r subset recode