Sharing large datasets between Matlab and R

2019-02-02 10:49发布

问题:

I need a relatively efficient way to share data between Matlab and R.

I have checked SaveR and MATLAB R-link, but SaveR formats Matlab's binary data as text strings first and then prints them to an ASCII file, which is not efficient for large datasets, and MATLAB R-link only works on Windows (it uses a COM-based interface).

Update:

Dirk has posted a list of what seem to be better solutions to this problem than SaveR and Matlab R-link. I also learned recently about RAM disks (see here and here for some implementation examples), and thought that they might facilitate the task of sharing large datasets between Matlab and R (or similar computational environments) further. This leads me to the following questions:

Assumming that the data fits in the machines' memory in Matlab's or R's native data containers:

  1. Are any of the solutions listed so far a better fit for RAM disks?

  2. Are there any additional considerations to be taken into account when dealing with RAM disks instead of with secundary-storage solutions?

Thanks!

回答1:

Couple of ideas, and with the caveat that I know more about the R side of things:

  • Tthe R.matlab package on CRAN can help: This package provides methods to read and write MAT files. It also makes it possible to communicate (evaluate code, send and retrieve objects etc.) with Matlab v6 or higher running locally or on a remote host

  • HDF5, as you suggested, is a possibility but I heard that the R support in CRAN package hdf5 is somewhat basic

  • NetCDF may be an alternative; CRAN has packages RNetCDF, ncdf and ncdf4

  • Use a database, especially a light and file-based one like SQLite or H4 both of which have R support

  • Use a common serialization / de-serialization format; R has support for Google Protocol Buffers via RProtoBuf and Google points to protobuf-matlab for Matlab

  • Write your own! Especially when you only need something basic like large rectangular matrices then nothing will beat a direct binary write; I did this once years ago for Octave (which is close to Matlab). You can extend Matab via mex files; R has its API and helpers like Rcpp. The larger your data sets, the more attractive this may look as you save the conversions.



回答2:

Matlab use HDF5 natively in last versions ("save" and "load"). There is a package for R. Then HDF5 might be a good solution.