I would like to reassign 128 column classes with a

2019-09-10 16:24发布

问题:

I can't seem to find what I need in other posts, essentially,

  1. I need to reorder my data from the data.table read in (I can't give the col classes fread statement because my columns are out of order)
  2. I need to change the columns classes to what I need listed below.

A lot of the other posts seem to be changing all of one type of class to another type of class:

Change the class of many columns in a data frame

Convert column classes in data.table

I believe my problem is different because there is no "change all factors to characters" etc. Each column has a specific class that I must change to ahead of time.

I have my column names in a vector called selectColumns that I pass to fread.

selectColumns <- c(giantListofColumnsGoesHere)
DT <- fread("DT.csv", select=selectColumns, na.strings=NAsList)

setcolorder(DT, selectColumns)
colClasses <- list('character','character','character','factor','numeric','character','numeric','integer','integer','integer','integer','numeric','numeric','factor','factor','factor','logical','integer','numeric','factor','integer','integer','integer','factor','factor','factor','factor','factor','integer','integer','factor','integer','factor','factor','integer','factor','numeric','factor','numeric','character','factor','factor','factor','factor','factor','factor','factor','factor','factor','factor','integer','factor','numeric','factor','factor','character','factor','factor','factor','integer','numeric','integer','integer','integer','integer','integer','factor','character','factor','factor','factor','factor','integer','factor','factor','character','integer','integer','integer','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical','logical')

#Now the part I can't figure out, I've tried:
lapply(DT, class) <- colClasses
#OR
attr(DT, class) <- colClasses
#Obviously attr(DT, class) just gives "data.table" "data.frame"

But I need to subset the DT's column attributes to get the lower level lists somehow, but I'm not great with lists and I can't seem to figure this out. I'm sorry if this is too easy of a question and already been answered essentially, but I'm lost and it seems like there is usually an easy way to do this.

I'm sorry I can't give data because this it contains private information.

Thanks for any help everyone.

回答1:

Suppose if the OP forgot to use colClasses inside fread or if there is any technical difficulty in using that and wants to change the class of the data.table, using set will be an option

for(j in seq_along(selectColumns)){
     set(DT, i= NULL, j=selectColumns[j], value = get(colClasses[j])(DT[[selectColumns[j]]]))
 } 

str(DT)
#Classes ‘data.table’ and 'data.frame':  5 obs. of  6 variables:
#$ V1: num  1 2 3 4 5
#$ V2: chr  "A" "B" "C" "D" ...
#$ V3: int  1 2 3 4 5
#$ V4: chr  "F" "G" "H" "I" ...
#$ V5: chr  "G" "H" "I" "J" ...
#$ V6: Factor w/ 5 levels "6","7","8","9",..: 1 2 3 4 5

Note that the initial class for the "selectColumns" were

str(DT)
#Classes ‘data.table’ and 'data.frame':  5 obs. of  6 variables:
#$ V1: int  1 2 3 4 5
#$ V2: chr  "A" "B" "C" "D" ...
#$ V3: num  1 2 3 4 5
#$ V4: chr  "F" "G" "H" "I" ...
#$ V5: chr  "G" "H" "I" "J" ...
#$ V6: int  6 7 8 9 10

data

 DT <- data.table(V1= 1:5, V2 = LETTERS[1:5], V3 = as.numeric(1:5),
          V4 = LETTERS[6:10], V5 = LETTERS[7:11], V6 = 6:10)
 colClasses <- paste0("as.",c("numeric", "integer", "factor"))
 selectColumns <- c("V1", "V3", "V6")

NOTE: Added as. to "colClasses" vector to make the conversion. If we are converting 'factor' to 'numeric', then we have to do this in two steps, i.e. first convert to 'character' and then to 'numeric' (Based on @Frank's suggestion in the comments)