I'm trying to find a speedier replacement for finding duplicates in R. The intent of the code is to pass the matrix to Rcpp with a row number from that matrix, then loop through the entire matrix looking for a match for that row. The matrix in question is a Logical matrix with 1000 rows and 250 cols.
Sounds simple, but the code below is not detecting equivalent vector rows. I'm not sure if it's an issue with the equal() function or something in how the matrix or vectors are defined.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::plugins]]
#include <cstddef> // std:size_t
#include <iterator> // std:begin, std::end
#include <vector> // std::vector
#include <iostream>
#include <string>
// [[Rcpp::export]]
bool dupCheckRcpp (int nVector,
LogicalMatrix bigMatrix) {
// initialize
int i, j, nrow, ncol;
nrow = bigMatrix.nrow();
ncol = bigMatrix.ncol();
LogicalVector vec(ncol); // holds vector of interest
LogicalVector vecMatrix(ncol); // temp vector for loop through bigMatrix
nVector = nVector - 1;
// copy bigMatrix data into vec based on nVector row
for ( j = 0; j < ncol; ++j ) {
vec(j) = bigMatrix(nVector,j);
}
// check loop: check vecTeam against each row in allMatrix
for (i = 0; i < nrow; ++i) {
// copy bigMatrix data into vecMatrix
for ( j = 0; j < ncol; ++j ) {
vecMatrix(j) = bigMatrix(i,j);
}
// check for equality
if (i != nVector) { // skip if nVector row
// compare vecTeam to vecMatrix
if (std::equal(vec.begin(),vec.end(),vecMatrix.begin())) {
return true;
}
}
} // close check loop
return false;
}
I'm not exactly sure where the mistake lies in your code, but note that you really shouldn't ever need to manually copy elements between Rcpp types like this:
There is almost always going to be a suitable class and / or appropriate assignment operator, etc. which allows you to accomplish this more succinctly and more safely (i.e. less prone to programming error). Here is a simpler implementation:
In the spirit of my advice above,
const LogicalMatrix::Row& y = x.row(r);
gives us a constant reference to ther
th row of the matrixx.row(i)
refers to thei
th row ofx
Both of these expressions avoid element-wise copying via
for
loop, and are more readable IMO. Additionally, while there is certainly nothing wrong with usingstd::equal
or any other standard algorithms, using Rcpp sugar expressions such asis_true(all(y == x.row(i)))
can often simplify your code even further.