可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This may be a bad question because I am not posting any reproducible example. My main goal is to identify columns that are of different types between two dataframe that have the same column names.
For example
df1
Id Col1 Col2 Col3
Numeric Factor Integer Date
df2
Id Col1 Col2 Col3
Numeric Numeric Integer Date
Here both the dataframes (df1, df2) have same column names but the Col1 type is different and I am interested in identifying such columns. Expected output.
Col1 Factor Numeric
Any suggestions or tips on achieving this ?. Thanks
回答1:
Try this:
compareColumns <- function(df1, df2) {
commonNames <- names(df1)[names(df1) %in% names(df2)]
data.frame(Column = commonNames,
df1 = sapply(df1[,commonNames], class),
df2 = sapply(df2[,commonNames], class)) }
回答2:
For a more compact method, you could use a list with sapply()
. Efficiency shouldn't be a problem here since all we're doing is grabbing the class. Here I add data frame names to the list to create a more clear output.
m <- sapply(list(df1 = df1, df2 = df2), sapply, class)
m[m[, "df1"] != m[, "df2"], , drop = FALSE]
# df1 df2
# Col1 "factor" "character"
where df1
and df2
are the data from @ycw's answer.
回答3:
If two data frame have same column names, then below will give you columns with different classes.
library(dplyr)
m1 = mtcars
m2 = mtcars %>% mutate(cyl = factor(cyl), vs = factor(cyl))
out = cbind(sapply(m1, class), sapply(m2, class))
out[apply(out, 1, function(x) !identical(x[1], x[2])), ]
回答4:
We can use sapply
with class
to loop through all columns in df1
and df2
. After that, we can compare the results.
# Create example data frames
df1 <- data.frame(ID = 1:3,
Col1 = as.character(2:4),
Col2 = 2:4,
Col3 = as.Date(paste0("2017-01-0", 2:4)))
df2 <- data.frame(ID = 1:3,
Col1 = as.character(2:4),
Col2 = 2:4,
Col3 = as.Date(paste0("2017-01-0", 2:4)),
stringsAsFactors = FALSE)
# Use sapply and class to find out all the class
class1 <- sapply(df1, class)
class2 <- sapply(df2, class)
# Combine the results, then filter for rows that are different
result <- data.frame(class1, class2, stringsAsFactors = FALSE)
result[!(result$class1 == result$class2), ]
class1 class2
Col1 factor character
回答5:
Try compare_df_cols()
from the janitor package:
library(janitor)
mtcars2 <- mtcars
mtcars2$cyl <- as.character(mtcars2$cyl)
compare_df_cols(mtcars, mtcars2, return = "mismatch")
#> column_name mtcars mtcars2
#> 1 cyl numeric character
Self-promotion alert, I authored this package - am posting this function because it exists to solve precisely this problem.