Suppose two ffdf
files:
library(ff)
ff1 <- as.ffdf(data.frame(matrix(rnorm(10*10),ncol=10)))
ff2 <- ff1
colnames(ff2) <- 1:10
How can I column bind these without loading them into memory? cbind
doesn't work.
There is the same question http://stackoverflow.com/questions/18355686/columnbind-ff-data-frames-in-r
but it does not have an MWE and the author abandoned it so I reposted.
You can use the following construct cbind.ffdf2
, making sure the column names of the two input ffdf
's are not duplicate:
library(ff)
ff1 <- as.ffdf(data.frame(letA = letters[1:5], numA = 1:5))
ff2 <- as.ffdf(data.frame(letB = letters[6:10], numB = 6:10))
cbind.ffdf2 <- function(d1, d2){
D1names <- colnames(d1)
D2names <- colnames(d2)
mergeCall <- do.call("ffdf", c(physical(d1), physical(d2)))
colnames(mergeCall) <- c(D1names, D2names)
mergeCall
}
cbind.ffdf2(ff1, ff2)[,]
Result:
letA numA letB numB
1 a 1 f 6
2 b 2 g 7
3 c 3 h 8
4 d 4 i 9
5 e 5 j 10
Sorry for joining this late.If you want to cbind an arbitrary number of ffdf objects without worrying of duplicate columns. You can try this (building on Audrey's solution).
ff1 <- as.ffdf(data.frame(letA = letters[1:5], numA = 1:5))
ff2 <- as.ffdf(data.frame(letA = letters[6:10], numB = 6:10))
cbind.ffdf2 <- function(...){
argl <- list(...)
if(length(argl) == 1L){
return(argl[[1]])
}else{
physicalList = NULL
for(i in 1:length(argl)){
if(class(argl[[i]]) == "data.frame"){
physicalList = c(physicalList, physical(as.ffdf(argl[[i]])))
}else{
physicalList = c(physicalList, physical(argl[[i]]))
}
}
mergeCall <- do.call("ffdf", physicalList)
return(mergeCall)
}
}
cbind.ffdf2(ff1, ff2)
It also coarses any data frame object in the list to an ffdf object.