How to column bind two ffdf

2019-01-28 16:20发布

问题:

Suppose two ffdf files:

library(ff)
ff1 <- as.ffdf(data.frame(matrix(rnorm(10*10),ncol=10)))
ff2 <- ff1
colnames(ff2) <- 1:10

How can I column bind these without loading them into memory? cbind doesn't work.

There is the same question http://stackoverflow.com/questions/18355686/columnbind-ff-data-frames-in-r but it does not have an MWE and the author abandoned it so I reposted.

回答1:

You can use the following construct cbind.ffdf2, making sure the column names of the two input ffdf's are not duplicate:

library(ff)
ff1 <- as.ffdf(data.frame(letA = letters[1:5], numA = 1:5))
ff2 <- as.ffdf(data.frame(letB = letters[6:10], numB = 6:10))

cbind.ffdf2 <- function(d1, d2){
  D1names <- colnames(d1)
  D2names <- colnames(d2)
  mergeCall <- do.call("ffdf", c(physical(d1), physical(d2)))
  colnames(mergeCall) <- c(D1names, D2names)
  mergeCall
}

cbind.ffdf2(ff1, ff2)[,]

Result:

   letA numA letB numB
1   a    1    f     6
2   b    2    g     7
3   c    3    h     8
4   d    4    i     9
5   e    5    j    10


回答2:

Sorry for joining this late.If you want to cbind an arbitrary number of ffdf objects without worrying of duplicate columns. You can try this (building on Audrey's solution).

ff1 <- as.ffdf(data.frame(letA = letters[1:5], numA = 1:5))
ff2 <- as.ffdf(data.frame(letA = letters[6:10], numB = 6:10))

cbind.ffdf2 <- function(...){
  argl <- list(...)
  if(length(argl) == 1L){
    return(argl[[1]])
  }else{
    physicalList = NULL
    for(i in 1:length(argl)){
      if(class(argl[[i]]) == "data.frame"){
        physicalList = c(physicalList, physical(as.ffdf(argl[[i]])))
      }else{
        physicalList = c(physicalList, physical(argl[[i]]))
      }

    }
    mergeCall <- do.call("ffdf", physicalList)
    return(mergeCall)
  }

}

cbind.ffdf2(ff1, ff2)

It also coarses any data frame object in the list to an ffdf object.