Method to rbind xts objects that removes duplicate

2019-04-15 01:52发布

问题:

Is there any method currently for xts objects that rbinds columns by names and keeps either all of the first object's rows or second object's rows?

I can rbind data and then remove duplicate index entries, however I believe by default will keep the first object's rows when duplicated.

回答1:

I don't believe there is an xts method for this, but we can still make it work, in at least a couple of ways.
If you look at ?rbind.xts you'll see this:

Identical indexed series are bound in the order or the arguments passed to rbind.

We can use that to our advantage.

First some example data

library(xts)

structure(c(5, 4, 2, 2, 4, 3, 3, 5), class = c("xts", "zoo"), .indexCLASS
= "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index =
structure(c(949449600, 949536000, 949708800, 949795200, 949881600,
949968000, 950054400, 950227200), tzone = "UTC", tclass = "Date"), .Dim =
c(8L, 1L)) -> d1

structure(c(3, 3, 3, 4, 2, 3, 3, 5), class = c("xts", "zoo"), .indexCLASS
= "Date", tclass = "Date", .indexTZ = "UTC", tzone = "UTC", index =
structure(c(948931200, 949104000, 949190400, 949449600, 949536000,
949622400, 949708800, 950054400), tzone = "UTC", tclass = "Date"), .Dim =
c(8L, 1L)) -> d2

If we then do an rbind() we'll get the duplicate values in the order we supplied d1 and d2. We can then use duplicated() to find the duplicates, and negate (!) that index to deselect them.

dat.bind <- rbind(d1, d2)

dat.bind.d1 <- dat.bind[!duplicated(time(dat.bind))]

To select the other set of duplicated values we can either switch the the order of arguments in rbind(), or we can shift the boolean vector we created with duplicated() one to the left, and thereby deselect the first, rather than the second, of two identical values.

dat.bind.d2 <- dat.bind[c(!duplicated(time(dat.bind))[-1], TRUE)]

There is one caveat with this approach, and that is that d1 and d2 must not individually have duplicate indices. If we use merge() instead we don't have this limitation.

We do an outer join (maning all values are included, NAs filled in as necessary). Then we can simply replace the NAs in one column with values at the same index in the other column.

dat.merged <- merge(d1, d2, join="outer")

dat.merged.d1 <- replace(dat.merged[, 1], 
                         is.na(dat.merged[, 1]), 
                         dat.merged[is.na(dat.merged[, 1]), 2])

dat.merged.d2 <- replace(dat.merged[, 2], 
                         is.na(dat.merged[, 2]), 
                         dat.merged[is.na(dat.merged[, 2]), 1])