It is possible to read .Rdata file format from C o

2019-02-20 02:06发布

问题:

I'm working writing some R extensions on C (C functions to be called from R).

My code needs to compute a statistic using 2 different datasets at the same time, and I need to perform this with all possible pair combinations. Then, I need all these statistics (very large arrays) to continue the calculation on the C side. Those files are very large, typically ~40GB, and that's my problem.

To do this on C called by R, first I need to load all the datasets in R to pass them then to the C function call. But, ideally, it is possible to maintain only 2 of those files on memory at the same time, following the sequence if I were able to access the datasets from C or Fortran directly:

open  file1 - open file2 - compute cov(1,2)
close file2
hold  file1 - open file3 - compute cov(1,3)
... // same approach

This is fine on R because I can load/unload files, but when calling C or Fortran I haven't any mechanism to load/unload files. So, my question is, can I read .Rdata files from Fortran or C directly, being able to open/close them? Any other approaches to the problem?

As far as I've read, the answer is no. So, I'm considering to move from Rdata to HDF5.

回答1:

It is not too hard to call R functions from C, using the .Call interface. So write an R function that inputs the data, and invoke that from C. When you're done with one file, UNPROTECT() the data you've read in. This is illustrated in the following

## function that reads my data in from a single file
fun <- function(fl)
    readLines(fl)

library(inline)  ## party trick -- compile C code from within R
doit <- cfunction(signature(fun="CLOSXP", filename="STRSXP", env="ENVSXP"), '
    SEXP lng = PROTECT(lang2(fun, filename)); // create R language expression
    SEXP ans = PROTECT(eval(lng, env));       // evaluate the expression
    // do things with the ans, e.g., ...
    int len = length(ans);
    UNPROTECT(2);                     // release for garbage collection
    return ScalarInteger(len);        // return something
')

doit(fun, "call.R", environment())

A simpler approach is to invert the problem -- read two data files in, then call C with the data.