I have an object of class big.matrix
in R
with dimension 778844 x 2
. The values are all integers (kilometres). My objective is to calculate the Euclidean distance matrix using the big.matrix
and have as a result an object of class big.matrix
. I would like to know if there is an optimal way of doing that.
The reason for my choice of using the class big.matrix
is memory limitation. I could transform my big.matrix
to an object of class matrix
and calculate the Euclidean distance matrix using dist()
. However, dist()
would return an object of size that would not be allocated in the memory.
Edit
The following answer was given by John W. Emerson, author and maintainer of the bigmemory
package:
You could use big algebra I expect, but this would also be a very nice use case for Rcpp via sourceCpp(), and very short and easy. But in short, we don't even attempt to provide high-level features (other than the basics which we implemented as proof-of-concept). No single algorithm could cover all use cases once you start talking out-of-memory big.
Here is a way using
RcppArmadillo
. Much of this is very similar to the RcppGallery example. This will return abig.matrix
with the associated pairwise (by row) euclidean distances. I like to wrap mybig.matrix
functions in a wrapper function to create a cleaner syntax (i.e. avoid the@address
and other initializations.Note - as we are using bigmemory (and therefore concerned with RAM usage) I have this example returned the N-1 x N-1 matrix of only lower triangular elements. You could modify this but this is what I threw together.
euc_dist.cpp
My little wrapper
The test