r compare column types between two dataframes

2019-06-01 00:18发布

问题:

This may be a bad question because I am not posting any reproducible example. My main goal is to identify columns that are of different types between two dataframe that have the same column names.

For example

df1

 Id      Col1      Col2     Col3
 Numeric Factor    Integer  Date

df2

 Id      Col1      Col2     Col3
 Numeric Numeric    Integer  Date

Here both the dataframes (df1, df2) have same column names but the Col1 type is different and I am interested in identifying such columns. Expected output.

Col1  Factor    Numeric

Any suggestions or tips on achieving this ?. Thanks

回答1:

Try this:

compareColumns <- function(df1, df2) {
  commonNames <- names(df1)[names(df1) %in% names(df2)]
  data.frame(Column = commonNames,
             df1 = sapply(df1[,commonNames], class),
             df2 = sapply(df2[,commonNames], class)) }


回答2:

For a more compact method, you could use a list with sapply(). Efficiency shouldn't be a problem here since all we're doing is grabbing the class. Here I add data frame names to the list to create a more clear output.

m <- sapply(list(df1 = df1, df2 = df2), sapply, class)
m[m[, "df1"] != m[, "df2"], , drop = FALSE]
#      df1      df2        
# Col1 "factor" "character"

where df1 and df2 are the data from @ycw's answer.



回答3:

If two data frame have same column names, then below will give you columns with different classes.

library(dplyr)
m1 = mtcars
m2 = mtcars %>% mutate(cyl = factor(cyl), vs = factor(cyl))
out = cbind(sapply(m1, class), sapply(m2, class))
out[apply(out, 1, function(x) !identical(x[1], x[2])), ]


回答4:

We can use sapply with class to loop through all columns in df1 and df2. After that, we can compare the results.

# Create example data frames
df1 <- data.frame(ID = 1:3,
                  Col1 = as.character(2:4),
                  Col2 = 2:4,
                  Col3 = as.Date(paste0("2017-01-0", 2:4)))

df2 <- data.frame(ID = 1:3,
                  Col1 = as.character(2:4),
                  Col2 = 2:4,
                  Col3 = as.Date(paste0("2017-01-0", 2:4)),
                  stringsAsFactors = FALSE)

# Use sapply and class to find out all the class
class1 <- sapply(df1, class)
class2 <- sapply(df2, class)

# Combine the results, then filter for rows that are different
result <- data.frame(class1, class2, stringsAsFactors = FALSE)
result[!(result$class1 == result$class2), ]
     class1    class2
Col1 factor character


回答5:

Try compare_df_cols() from the janitor package:

library(janitor)
mtcars2 <- mtcars
mtcars2$cyl <- as.character(mtcars2$cyl)
compare_df_cols(mtcars, mtcars2, return = "mismatch")

#>   column_name  mtcars   mtcars2
#> 1         cyl numeric character

Self-promotion alert, I authored this package - am posting this function because it exists to solve precisely this problem.