I have reasonably large sparse matrix (dgCMatrix
or dgTMatrix
, but this is not very important here). And I want to set some elements to zero.
For example I have 3e4 * 3e4
matrix, which is upper triangular and it is quite dense: ~23% of elements are not zeros. (actually I have much bigger matrices ~ 1e5 * 1e5
, but they are much more sparser) So in triplet dgTMatrix
form it takes about 3.1gb of RAM.
Now I want to set to zero all elements which are less some threshold (say, 1
).
Very naive approach (which also was discussed here) will be following:
threshold <- 1 m[m < threshold] <- 0
But this solution is far from perfect - 130 sec runtime (on machine which has enough ram, so there is no swapping) and what is more important needs ~ 25-30gb additional RAM.
Second solution I found (and mostly happy) is far more effective - construct new matrix from scratch:
threshold <- 1 ind <- which(m@x > threshold) m <- sparseMatrix(i = m@i[ind], j = m@j[ind], x = m@x[ind], dims = m@Dim, dimnames = m@Dimnames, index1 = FALSE, giveCsparse = FALSE, check = FALSE)
It takes only ~ 6 sec and needs ~ 5gb additional RAM.
The question is - can we do better? Especially interesting, whether, can we do this with less RAM usage? It would be perfect if will be able to perform this in place.