How to use apply or sapply or lapply with ffdf?

2019-02-11 07:25发布

问题:

Is there a way to use an apply type construct directly to the columns of a ffdf object? I am trying to count the NAs in each column without having to turn it into a standard data frame. I can get the na count for the individual columns using:

sum(is.na(ffdf$columnname))

But is there a way to do this for all the columns in the dataframe at once, something like:

lapply(ffdf, function(x){sum(is.na(x))})

When I run this I get:

$virtual
[1] 0

$physical
[1] 0

$row.names
[1] 0

I have not been able to find a special version of lapply or sapply in the ff documentation. Further is there a simple way to count the NAs over the entire ffdf in one go?

回答1:

An ffdf is basically a list with elements "virtual", "physical", "row.names". If you do an lapply over the physical element, you have what you want.

require(ffbase)
myffdf <- as.ffdf(iris)
lapply(physical(myffdf), FUN=function(x) sum(is.na(x)))

As is.na and sum is generic, this will basically use is.na.ff and sum.ff from package ffbase such that data is loaded into RAM chunkwise according to what your computer can handle.



标签: r bigdata