I have a data frame with numeric and ordered factor columns. I have lot of NA values, so no level is assigned to them. I changed NA to "No Answer", but levels of the factor columns don't contain that level, so here is how I started, but I don't know how to finish it in an elegant way:
addNoAnswer = function(df) {
factorOrNot = sapply(df, is.factor)
levelsList = lapply(df[, factorOrNot], levels)
levelsList = lapply(levelsList, function(x) c(x, "No Answer"))
...
Is there a way to directly apply new levels to factor columns, for example, something like this:
df[, factorOrNot] = lapply(df[, factorOrNot], factor, levelsList)
Of course, this doesn't work correctly.
I want the order of levels preserved and "No Answer" level added to last place.
You could define a function that adds the levels to a factor, but just returns anything else:
addNoAnswer <- function(x){
if(is.factor(x)) return(factor(x, levels=c(levels(x), "No Answer")))
return(x)
}
Then you just lapply
this function to your columns
df <- as.data.frame(lapply(df, addNoAnswer))
That should return what you want.
The levels
function accept the levels(x) <- value
call. Therefore, it's very easy to add different levels:
f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
str(f1)
Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
levels(f1) <- c(levels(f1),"No Answer")
f1[is.na(f1)] <- "No Answer"
str(f1)
Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...
You can then loop it around all variables in a data.frame:
f1 <- factor(c("a", "a", NA, NA, "b", NA, "a", "c", "a", "c", "b"))
f2 <- factor(c("c", NA, "b", NA, "b", NA, "c" ,"a", "d", "a", "b"))
f3 <- factor(c(NA, "b", NA, "b", NA, NA, "c", NA, "d" , "e", "a"))
df1 <- data.frame(f1,n1=1:11,f2,f3)
str(df1)
'data.frame': 11 obs. of 4 variables:
$ f1: Factor w/ 3 levels "a","b","c": 1 1 NA NA 2 NA 1 3 1 3 ...
$ n1: int 1 2 3 4 5 6 7 8 9 10 ...
$ f2: Factor w/ 4 levels "a","b","c","d": 3 NA 2 NA 2 NA 3 1 4 1 ...
$ f3: Factor w/ 5 levels "a","b","c","d",..: NA 2 NA 2 NA NA 3 NA 4 5 ...
for(i in 1:ncol(df1)) if(is.factor(df1[,i])) levels(df1[,i]) <- c(levels(df1[,i]),"No Answer")
df1[is.na(df1)] <- "No Answer"
str(df1)
'data.frame': 11 obs. of 4 variables:
$ f1: Factor w/ 4 levels "a","b","c","No Answer": 1 1 4 4 2 4 1 3 1 3 ...
$ n1: int 1 2 3 4 5 6 7 8 9 10 ...
$ f2: Factor w/ 5 levels "a","b","c","d",..: 3 5 2 5 2 5 3 1 4 1 ...
$ f3: Factor w/ 6 levels "a","b","c","d",..: 6 2 6 2 6 6 3 6 4 5 ...
Since this question was last answered this has become possible using fct_explicit_na()
from the forcats
package. I add here the example given in the documentation.
f1 <- factor(c("a", "a", NA, NA, "a", "b", NA, "c", "a", "c", "b"))
table(f1)
# f1
# a b c
# 4 2 2
f2 <- forcats::fct_explicit_na(f1)
table(f2)
# f2
# a b c (Missing)
# 4 2 2 3
Default value is (Missing)
but this can be changed via the na_level
argument.
Expanding on ilir's answer and its comment, you can check if a column is a factor and that it does not already contain the new level, then add the level and thus make the function re-runable:
addLevel <- function(x, newlevel=NULL) {
if(is.factor(x)) {
if (is.na(match(newlevel, levels(x))))
return(factor(x, levels=c(levels(x), newlevel)))
}
return(x)
}
You can then apply it like so:
dataFrame$column <- addLevel(dataFrame$column, "newLevel")
You need to convert the column to character, next add the new level based on the condition then at last convert column to factor.
Steps
1.First Convert Factor column to character:
df$column2 <- as.character(column2)
2.Add the new level
df[df$column1=="XYZ",]column2 <- "new_level"
3.Convert to factor again
df$column2 <- as.factor(df$column2)
I have a very simple answer that may not directly address your specific scenario, but is a simple way to do this generally
levels(df$column) <- c(levels(df$column), newFactorLevel)