RcppEigen - going from inline to a .cpp function i

2019-05-21 12:17发布

问题:

Everything seems to work in my package, but I wanted to check if the steps to make it were correct and about memory use using "Map". (It's a simple example, somewhere in-between the inline examples and the fastLm() example.)

Here is an inline function that takes the maximum over each column of a matrix:

library(Rcpp); 
library(inline); 
library(RcppEigen);

maxOverColCpp <- '
    using Eigen::Map;
    using Eigen::MatrixXd;

   // Map the double matrix AA from R
   const Map<MatrixXd> A(as<Map<MatrixXd> >(AA));

   // evaluate and columnwise maximum entry of A
   const MatrixXd Amax(A.colwise().maxCoeff());
   return wrap(Amax);
'

rcppeigen_max_over_columns <- cxxfunction(signature(AA = "matrix"), maxOverColCpp, plugin = "RcppEigen")

Then to change the function to include it in an existing R package, I rewrote the code as follows, saved it in rcppeigen_max_over_columns.cpp in a new src folder in an existing R package:

// we only include RcppEigen.h which pulls Rcpp.h in for us
#include <RcppEigen.h>

// via the depends attribute we tell Rcpp to create hooks for
// RcppEigen so that the build process will know what to do
//
// [[Rcpp::depends(RcppEigen)]]

// via the exports attribute we tell Rcpp to make this function
// available from R
//
// [[Rcpp::export]]
Eigen::MatrixXd rcppeigen_max_over_columns(const Eigen::MatrixXd & A){
    Eigen::MatrixXd Amax = A.colwise().maxCoeff();
    return Amax;
}

(In fact it was a bit longer as I also needed to include finding the maximum over rows.)

Then:

  • modified the DESCRIPTION FILE with the lines:

    Imports: Rcpp (>= 0.11.3), RcppEigen (>= 0.3.2.2.0)

    LinkingTo: Rcpp, RcppEigen

  • modified the NAMESPACE file with the lines:

    useDynLib(toyRpackage)

    import(RcppEigen)

    importFrom(Rcpp, evalCpp)

  • in R terminal, typed this, which I assume glues the R and C++:

    Rcpp::compileAttributes(pkgdir="toyRpackage", verbose=getOption("verbose"))

Then as for a regular package, I did R CMD check and R CMD build.

  • First question is whether this process for including an RcppEigen function into an existing R package is correct? (I completely ignored any Makevars files or any .h files -- I don't really know what they do... Also don't really understand the changes to the NAMESPACE file. I tried to copy the RcppEigen.package.skeleteon() set-up but I am adding my function to an existing package. So it would be good to know if it's okay in case I missed something that could be a problem later.)

  • Second question is whether I need a "Map" somewhere in rcppeigen_max_over_columns.cpp so that the matrix isn't copied when it's passed from R to C++?

I know it's a beginner question, but I'm having some trouble understanding the syntax in .cpp files as I don't know any C++. I thought maybe this question might help someone else who is also trying to add a simple function to their package. :)

Also, if you have any strong feelings about using RcppEigen over RcppArmadillo, please let me know. I read http://thread.gmane.org/gmane.comp.lang.r.rcpp/3522 which was useful. For my example of taking max over columns, RcppEigen seems much faster, not sure why.

回答1:

First question is whether this process for including an RcppEigen function into an existing R package is correct? (I completely ignored any Makevars files or any .h files -- I don't really know what they do... Also don't really understand the changes to the NAMESPACE file. I tried to copy the RcppEigen.package.skeleteon() set-up but I am adding my function to an existing package. So it would be good to know if it's okay in case I missed something that could be a problem later.)

For basic R packages with relatively simple C++ code, you do not need to include header files, or a custom Makevars / Makefile or anything like that. If you build something more complicated, you might need a Makefile / Makevars to help handle the build process, and might want to use header files to separate the interface from the implementation -- but for that you'll have to dive in the deep end and pick up some C++ books, because there's a lot to learn.

In other words -- what you're doing is perfectly fine. For simple cases, it's fine to just use .cpp files in the src/ directory (and let Rcpp, attributes, and its other sibling packages handle the rest)

Second question is whether I need a "Map" somewhere in rcppeigen_max_over_columns.cpp so that the matrix isn't copied when it's passed from R to C++?

Well, data will almost always be copied when transferring an R object to a (non-Rcpp) class, unless you specifically use a constructor that can re-use the underlying data. I am not aware of whether Eigen has a constructor that can re-use memory, but I would suggest that unless you know that it matters, don't worry about it (because copying a range of data is typically quite fast)



回答2:

For the second question, it turns out this modification (thanks to my colleague Eric Lin) was all that was necessary, and it does seem to help, at least for benchmark examples of A <- matrix(rnorm(10000*100), 10000, 100). The max over columns is about 30 times faster than R with original cpp function, and 100 times faster than R using Eigen::Map.

Eigen::MatrixXd rcppeigen_max_over_columns(const Eigen::Map<Eigen::MatrixXd>  & A){
    Eigen::MatrixXd Amax = A.colwise().maxCoeff();
    return Amax;
}