Union of dataframes in R by rownames

2020-07-13 08:59发布

问题:

I have 4 dataframes, each the index in a list. I would like to combine them altogether as one dataframe. In set language from mathematics, it would make most sense for this to be the union on the rownames. So I might have something like this:

U <- union(dfSub[[1]], dfSub[[2]], dfSub[[3]], dfSub[[4]])

The problem with the union function is that it operates only on vectors. How can I get this to work on dataframes?

  1. How can I translate this into R?
  2. Is there a better way of achieving the desired result?

EDIT: How can I preserve rownames after the union?

回答1:

First, bind them together:

df.cat <- rbind(dfSub[[1]], dfSub[[2]], dfSub[[3]], dfSub[[4]])

or better:

df.cat <- do.call(rbind, dfSub[1:4])

This first step requires that all data.frames have the same column names. If it is not the case, then you might be interested in the rbind.fill function from the plyr package:

library(plyr)
df.cat <- rbind.fill(dfSub[1:4])

Then, to remove duplicates if you need (as a set union would):

df.union <- unique(df.cat)


回答2:

You can combine dataframes with the merge function. Since you have multiple dataframes you can use Reduce to merge them all at once.

merged.data <- Reduce(function(...) merge(...), list(dfSub[[1]], dfSub[[2]], dfSub[[3]], dfSub[[4]])

As an example:

> people <- c('Bob', 'Jane', 'Pat')
> height <- c(72, 64, 68)
> weight <- c(220, 130, 150)
> age <- c(45, 32, 35)
> height.data <- data.frame(people, height)
> weight.data <- data.frame(people, weight)
> age.data <- data.frame(people, age)

> height.data
  people height
1    Bob     72
2   Jane     64
3    Pat     68
> weight.data
  people weight
1    Bob    220
2   Jane    130
3    Pat    150
> age.data
  people age
1    Bob  45
2   Jane  32
3    Pat  35


> Reduce(function(...) merge(...), list(height.data, weight.data, age.data))
  people height weight age
1    Bob     72    220  45
2   Jane     64    130  32
3    Pat     68    150  35