R - Exctracting a vector of doubles from data.fram

2019-09-21 02:58发布

问题:

I got this question using read.table() with or without header=T, trying to extract a vector of doubles from the resulting data.frame with as.double(as.character()) (see ?factor).

But that's just how I realized that I don't understand R's logic. So you won't see e.g. read.table in the code below, only the necessary parts. Could you tell me what's the difference between the following options?

  1. With header=T equivalent:

    (a <- data.frame(array(c(0.5,0.5,0.5,0.5), c(1,4))))
    as.character(a)
    # [1] "0.5" "0.5" "0.5" "0.5"
    
  2. Without header=T equivalent:

    b <- data.frame(array(c("a",0.5,"b",0.5,"c",0.5,"d",0.5), c(2,4)))
    (a <- b[2,])
    as.character(a)
    # [1] "1" "1" "1" "1"
    
    (a <- data.frame(a, row.names=NULL)) # now there's not even a visual difference
    as.character(a)
    # [1] "1" "1" "1" "1"
    

回答1:

The problem lies in the default setting of data.frame, where one of the options, stringsAsFactors is set to TRUE. This is a problem in your scenario because when you use header = FALSE, the presence of character values in that row coerces the entire column to characters, which is then converted to factors (unless you set stringsAsFactors = FALSE).

Here are some examples to play with:

## Two similar `data.frame`s -- just one argument different

b <- data.frame(array(c("a",0.5,"b",0.5,"c",0.5,"d",0.5), c(2,4)))
b2 <- data.frame(array(c("a",0.5,"b",0.5,"c",0.5,"d",0.5), c(2,4)),
                stringsAsFactors = FALSE)

## First with "b"

as.character(b[2, ])
# [1] "1" "1" "1" "1"

sapply(b[2, ], as.character)
#    X1    X2    X3    X4 
# "0.5" "0.5" "0.5" "0.5"
as.matrix(b)[2, ]
#    X1    X2    X3    X4 
# "0.5" "0.5" "0.5" "0.5"
as.double(as.matrix(b)[2, ])
# [1] 0.5 0.5 0.5 0.5

## Now with "b2"

as.character(b2[2, ])
# [1] "0.5" "0.5" "0.5" "0.5"
as.double(as.character(b2[2, ]))
# [1] 0.5 0.5 0.5 0.5