rbind data frames, duplicated rownames issue

2019-04-12 07:58发布

问题:

While duplicated row (and column) names are allowed in a matrix, they are not allowed in a data.frame. Trying to rbind() some data frames having row names in common highlights this problem. Consider two data frames below:

foo = data.frame(a=1:3, b=5:7)
rownames(foo)=c("w","x","y")
bar = data.frame(a=c(2,4), b=c(6,8))
rownames(bar)=c("x","z")
# foo               bar
#   a b               a b
# w 1 5             x 2 6
# x 2 6             y 4 8
# y 3 7

Now trying to rbind() them (Pay attention to the row names):

rbind(foo, bar)
#    a b
# w  1 5
# x  2 6
# y  3 7
# x1 2 6
# z  4 8

But for the case of matrix:

rbind(as.matrix(foo), as.matrix(bar))
#   a b
# w 1 5
# x 2 6
# y 3 7
# x 2 6
# z 4 8

Here is the problem: How to rbind() two data frames, having duplicated rows (with the same row name) removed?

回答1:

How about

duprows <- which(!is.na(match(rownames(bar),rownames(foo))))
rbind(foo,bar[-duprows,])

?

Or (based on comments below)

duprows <- rownames(bar) %in% rownames(foo)
rbind(foo, bar[!duprows,])

Several variations are possible depending on (1) selected matched or unmatched; (2) finding numeric or logical values for the matches.