Having a data frame, how do I go about replacing all particular values along all rows and columns. Say for example I want to replace all empty records with NA
's (without typing the positions):
df <- data.frame(list(A=c("", "xyz", "jkl"), B=c(12, "", 100)))
A B
1 12
2 xyz
3 jkl 100
Expected result:
A B
1 NA 12
2 xyz NA
3 jkl 100
Like this:
> df[df==""]<-NA
> df
A B
1 <NA> 12
2 xyz <NA>
3 jkl 100
Since PikkuKatja and glallen asked for a more general solution and I cannot comment yet, I'll write an answer. You can combine statements as in:
> df[df=="" | df==12] <- NA
> df
A B
1 <NA> <NA>
2 xyz <NA>
3 jkl 100
For factors, zxzak's code already yields factors:
> df <- data.frame(list(A=c("","xyz","jkl"), B=c(12,"",100)))
> str(df)
'data.frame': 3 obs. of 2 variables:
$ A: Factor w/ 3 levels "","jkl","xyz": 1 3 2
$ B: Factor w/ 3 levels "","100","12": 3 1 2
If in trouble, I'd suggest to temporarily drop the factors.
df[] <- lapply(df, as.character)
We can use data.table to get it quickly.
First create df without factors,
df <- data.frame(list(A=c("","xyz","jkl"), B=c(12,"",100)), stringsAsFactors=F)
Now you can use
setDT(df)
for (jj in 1:ncol(df)) set(df, i = which(df[[jj]]==""), j = jj, v = NA)
and you can convert it back to a data.frame
setDF(df)
If you only want to use data.frame and keep factors it's more difficult, you need to work with
levels(df$value)[levels(df$value)==""] <- NA
where value is the name of every column. You need to insert it in a loop.
If you want to replace multiple values in a data frame, looping through all columns might help.
Say you want to replace ""
and 100
:
na_codes <- c(100, "")
for (i in seq_along(df)) {
df[[i]][df[[i]] %in% na_codes] <- NA
}