I have a data frame with 5 different columns:
Test1 Test2 Test3 Test4 Test5
Sample1 PASS PASS FAIL WARN WARN
Sample2 PASS PASS FAIL PASS WARN
Sample3 PASS FAIL FAIL PASS WARN
Sample4 PASS FAIL FAIL PASS WARN
Sample5 PASS WARN FAIL WARN WARN
In each column, each level is assigned a different factor.
In column 1, "PASS" is 1.
In column 2, "PASS" is 2 and "FAIL is 1.
In column 3, "FAIL" is 1.
In column 4, "PASS" is 1 and "WARN" is 2.
In column 5, "WARN" IS 1.
It is doing it by alphabetical order
I need "PASS" be 1 in all columns, "WARN" to be 2 in all columns, and "FAIL" 3 in all columns, so that I can then convert into a matrix and turn it into a heatmap.
Currently it is assigning the factors to the levels depending on which ones show up in a specific column, and by alphabetical order.
How can I keep it constant throughout the entire data frame?
You could change the levels of the dataset "df" to be in the same order by looping (lapply
) and convert to factor
again with the specified levels
and assign it back to the corresponding columns.
lvls <- c('PASS', 'WARN', 'FAIL')
df[] <- lapply(df, factor, levels=lvls)
str(df)
# 'data.frame': 5 obs. of 5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2
If you opt to use data.table
library(data.table)
setDT(df)[, names(df):= lapply(.SD, factor, levels=lvls)]
setDT
converts to "data.frame" to "data.table", assign (:=
) the column names of the dataset to the reconverted factor columns (lapply(..)
). .SD
denotes "Subset of Datatable".
data
df <- structure(list(Test1 = structure(c(1L, 1L, 1L, 1L, 1L),
.Label = "PASS", class = "factor"),
Test2 = structure(c(2L, 2L, 1L, 1L, 3L), .Label = c("FAIL",
"PASS", "WARN"), class = "factor"), Test3 = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "FAIL", class = "factor"), Test4 =
structure(c(2L, 1L, 1L, 1L, 2L), .Label = c("PASS", "WARN", "FAIL"),
class = "factor"), Test5 = structure(c(1L, 1L, 1L, 1L, 1L), .Label =
"WARN", class = "factor")), .Names = c("Test1",
"Test2", "Test3", "Test4", "Test5"), row.names = c("Sample1",
"Sample2", "Sample3", "Sample4", "Sample5"), class = "data.frame")
Using dplyr
:
library(dplyr)
df <- df %>% mutate_each(funs(factor(., levels = c('PASS', 'WARN', 'FAIL'))))
You get:
#> str(df)
#'data.frame': 5 obs. of 5 variables:
# $ Test1: Factor w/ 3 levels "PASS","WARN",..: 1 1 1 1 1
# $ Test2: Factor w/ 3 levels "PASS","WARN",..: 1 1 3 3 2
# $ Test3: Factor w/ 3 levels "PASS","WARN",..: 3 3 3 3 3
# $ Test4: Factor w/ 3 levels "PASS","WARN",..: 2 1 1 1 2
# $ Test5: Factor w/ 3 levels "PASS","WARN",..: 2 2 2 2 2
A more general approach supposing you can have other string
values in your data.frame
and NA
:
library(magrittr)
fac = df %>% as.matrix %>% as.vector %>% unique
df1 = data.frame(lapply(df, factor, levels = fac[!is.na(fac)]))