So I have a data frame (called gen) filled with nucleotide information: each value is either A, C, G, or T. I am looking to replace A with 1, C with 2, G with 3, and T with 4. When I use the function gen[gen==A] = 1
, I get the error:
Error in [<-.data.frame
(*tmp*
, gen == A, value = 1) :
object 'A' not found
I even tried using gen <- replace(gen, gen == A, 1)
, but it gives me the same error. Does anyone know how to fix this error? If not, is there a package that I can install in R with a program that will convert A, C, G, and T to numeric values?
Thanks
You need to wrap A in quotes or else R looks for a variable named A.
If the columns are character vectors:
R> gen = data.frame(x = sample(c("A", "C", "G", "T"), 10, replace = TRUE), y = sample(c("A", "C", "G", "T"), 10, replace= TRUE), stringsAsFactors = FALSE)
R> gen[gen == "A"] = 1
R> gen
x y
1 1 1
2 C C
3 G T
4 T T
5 G G
6 G G
7 1 1
8 C C
9 T 1
10 1 1
also 1 way to do all at once
R> library(car)
R> sapply(gen, recode, recodes = "'A'=1; 'C'=2; 'G'=3; 'T'=4")
x y
[1,] 1 1
[2,] 2 2
[3,] 3 4
[4,] 4 4
[5,] 3 3
[6,] 3 3
[7,] 1 1
[8,] 2 2
[9,] 4 1
[10,] 1 1
If the columns are factors
R> gen = data.frame(x = sample(c("A", "C", "G", "T"), 10, replace = TRUE), y = sample(c("A", "C", "G", "T"), 10, replace= TRUE))
R> sapply(gen, as.numeric)
x y
[1,] 1 1
[2,] 2 4
[3,] 1 2
[4,] 4 1
[5,] 2 2
[6,] 1 4
[7,] 4 3
[8,] 3 3
[9,] 2 4
[10,] 4 2