I got this question using read.table()
with or without header=T
, trying to extract a vector of doubles from the resulting data.frame
with as.double(as.character())
(see ?factor
).
But that's just how I realized that I don't understand R's logic. So you won't see e.g. read.table
in the code below, only the necessary parts. Could you tell me what's the difference between the following options?
With header=T
equivalent:
(a <- data.frame(array(c(0.5,0.5,0.5,0.5), c(1,4))))
as.character(a)
# [1] "0.5" "0.5" "0.5" "0.5"
Without header=T
equivalent:
b <- data.frame(array(c("a",0.5,"b",0.5,"c",0.5,"d",0.5), c(2,4)))
(a <- b[2,])
as.character(a)
# [1] "1" "1" "1" "1"
(a <- data.frame(a, row.names=NULL)) # now there's not even a visual difference
as.character(a)
# [1] "1" "1" "1" "1"
The problem lies in the default setting of data.frame
, where one of the options, stringsAsFactors
is set to TRUE
. This is a problem in your scenario because when you use header = FALSE
, the presence of character values in that row coerces the entire column to characters, which is then converted to factors (unless you set stringsAsFactors = FALSE
).
Here are some examples to play with:
## Two similar `data.frame`s -- just one argument different
b <- data.frame(array(c("a",0.5,"b",0.5,"c",0.5,"d",0.5), c(2,4)))
b2 <- data.frame(array(c("a",0.5,"b",0.5,"c",0.5,"d",0.5), c(2,4)),
stringsAsFactors = FALSE)
## First with "b"
as.character(b[2, ])
# [1] "1" "1" "1" "1"
sapply(b[2, ], as.character)
# X1 X2 X3 X4
# "0.5" "0.5" "0.5" "0.5"
as.matrix(b)[2, ]
# X1 X2 X3 X4
# "0.5" "0.5" "0.5" "0.5"
as.double(as.matrix(b)[2, ])
# [1] 0.5 0.5 0.5 0.5
## Now with "b2"
as.character(b2[2, ])
# [1] "0.5" "0.5" "0.5" "0.5"
as.double(as.character(b2[2, ]))
# [1] 0.5 0.5 0.5 0.5