I am working on a huge dataset and I would like to derive the distribution of a test statistic. Hence I need to do calculations with huge matrices (200000x200000) and as you might predict I have memory issues. More precisely I get the following: Error: cannot allocate vector of size ... Gb. I work on the 64-bit version of R and my RAM is 8Gb. I tried to use the package bigmemory but with not big success.
The first issue comes when I have to calculate the distance matrix. I found this nice function in amap package called Dist that calculates the distance of a columns of a dataframe on parallel and it works well, however it produces the lower/upper triangular. I need the distance matrix to perform matrix multiplications and unfortunately I cannot with half of the matrix. When use the as.matrix function to make it full, I have again memory issues.
So my question is how can I convert a dist object to a big.matrix by skipping the as.matrix step. I suppose that it might be an Rccp question, please have in mind that I am really new at Rccp.
Thanx in advance!
On converting a "dist" object to "(big.)matrix":
stats:::as.matrix.dist
has calls torow
,col
,t
and operators that create large intermediate objects. Avoiding these you could, among other alternatives, use something like:With data:
Then, slowly, allocate and fill a "matrix":
(It seems that with allocating "md" as
big.matrix(n, n, init = 0)
equally works)Using smaller "nr" we could test: