Change class of variables in a data frame using an

2019-05-02 09:14发布

问题:

I was looking for some way to change class of variables in one data frame by using the reference of another data frame which has information of class for each variable.

I have a data which contains around 150 variables. All the variables are in character format. Now I want to change the class of each variable depending upon its type. For this we created a separate data frame having information of class for each of the variables. Let me explain with an sample data frame.

Consider my original data frame to be df with 5 variables -

df <- data.frame(A="a",B="1",C="111111",D="d",E="e")

Now we have another data frame "variable_info" which contains just 2 variables, one "variable_name" and another "variable_class".

variable_info <- data.frame(variable_name=c("A","B","C","D","E"),variable_class=c("character","integer","numeric","character","character"))

Now using the variable_info data frame I want to change the class for each of the variables in df so that their class is as specified in "variable_info$variable_class" linking the variable name with "variable_info$variable_name"

How can we do this for a data frame? Will it be good to do this in data.table? How can we do this in data.table?

Thank you!!

Prasad

回答1:

You could try it like this:

Make sure both tables are in the same order:

variable_info <- variable_info[match(variable_info$variable_name, names(df)),]

Create a list of function calls:

funs <- sapply(paste0("as.", variable_info$variable_class), match.fun)

Then map them to each column:

df[] <- Map(function(dd, f) f(as.character(dd)), df, funs)

With data.table you could do it almost the same way, except you replace the last line by:

library(data.table)
dt <- as.data.table(df) # or use setDT(df)
dt[, names(dt) := Map(function(dd, f) f(as.character(dd)), dt, funs)]


回答2:

An alternative approach is to use a function. This function can take any pair of dataframes, find their common columns and assign the class of the first to the columns in the second.

    matchColClasses<- function(df1, df2){
    # Purpose:  protect joins from column type mismatches - a problem with multi-column empty df          
    # Input:    df1 - master for class assignments, df2 - for col reclass and return.
    # Output:   df2 with shared columns classed to match df1
    # Usage:    df2 <- matchColClasses(df1, df2)

      sharedColNames <- names(df1)[names(df1) %in% names(df2)]
      sharedColTypes <- sapply(df1[,sharedColNames], class)

      for (n in sharedColNames) {
        class(df2[, n]) <- sharedColTypes[n]
      }

      return(df2)
     }