Right now I'm working with some huge matrices in R and I need to be able to reassemble them using diagonal bands. For programming reasons (to avoid having to do n*n operations for a matrix of size n (millions of calculations), I wanted to just do 2n calculations (thousands of calculations) and thus chose to do run my function on the diagonal bands of the matrix. Now, I have the results, but need to take these matrix slices and assemble them in a way that allows me to use multiple processors.
Both foreach and mclapply won't let me modify objects outside of loops,so I'm trying to think of a parallel solution. If there was some function to assign an off-diagonal band to a part of a matrix that could be reliably done, I'm all for it.
input:
[1] 0.3503037
[1] 0.2851895 0.2851895
[1] 0.5233396 0.5233396 0.5233396
[1] 0.6250584 0.6250584 0.6250584 0.6250584
[1] 0.4300964 0.4300964 0.4300964 0.4300964 0.4300964
[1] 0.4300964 0.4300964 0.4300964 0.4300964 0.4300964
[1] 0.3949782 0.3949782 0.3949782 0.3949782
[1] 0.7852812 0.7852812 0.7852812
[1] 0.5309648 0.5309648
[1] 0.7718504
desired output (with parallel operations):
[,1] [,2] [,3] [,4] [,5]
[1,] 0.4300964 0.6250584 0.5233396 0.2851895 0.3503037
[2,] 0.3949782 0.4300964 0.6250584 0.5233396 0.2851895
[3,] 0.7852812 0.3949782 0.4300964 0.6250584 0.5233396
[4,] 0.5309648 0.7852812 0.3949782 0.4300964 0.6250584
[5,] 0.7718504 0.5309648 0.7852812 0.3949782 0.4300964
The more I look at this, I need a version of Matrix::bandSparse that is parallelized.
If you want to build a single matrix you are looking for shared memory parallelism. Both parallel
and foreach
implement distributed memory parallelism. I know of one R package that implements shared memory (Rdsm
), but I have not used it. The more natural way to get shared memory parallelism is by using C++.
I have implemented the bands to matrix conversion in R (serial), C++ with Rcpp (serial) and C++ plus OpenMP with Rcpp and RcppParallel (parallel). Note that the input I used was a list of vectors without duplicated diagonal. For the OpenMP solution I converted this to a (ragged) matrix
since this allows for easy conversion to a thread safe RMatrix
. This conversion is very fast even in R:
#include <Rcpp.h>
#include <algorithm>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix diags2mtrCpp(int n, const ListOf<const NumericVector>& diags) {
NumericMatrix mtr(n, n);
int nDiags = diags.size();
for (int i = 0; i < nDiags; ++i) {
NumericVector diag(diags[i]);
int nDiag = diag.size();
int row = std::max(1, i - n + 2);
int col = std::max(1, n - i);
for (int j = 0; j < nDiag; ++j) {
mtr(row + j - 1, col + j - 1) = diag(j);
}
}
return mtr;
}
// [[Rcpp::plugins(openmp)]]
#include <omp.h>
// [[Rcpp::depends(RcppParallel)]]
#include <RcppParallel.h>
using namespace RcppParallel;
// [[Rcpp::export]]
NumericMatrix diags2mtrOmp(const NumericMatrix& diags_matrix, const IntegerVector& diags_length) {
int nDiags = diags_matrix.cols();
int n = diags_matrix.rows();
NumericMatrix res(n, n);
RMatrix<double> mtr(res);
RMatrix<double> diags(diags_matrix);
RVector<int> diagSize(diags_length);
#pragma omp parallel for
for (int i = 0; i < nDiags; ++i) {
int nDiag = diagSize[i];
int row = std::max(1, i - n + 2);
int col = std::max(1, n - i);
for (int j = 0; j < nDiag; ++j) {
mtr(row + j - 1, col + j - 1) = diags(j, i);
}
}
return res;
}
/*** R
set.seed(42)
n <- 2^12
n
diags <- vector(mode = "list", length = 2 * n - 1)
for (i in seq_len(n)) {
diags[[i]] <- rep.int(runif(1), i)
diags[[2 * n - i]] <- rep.int(runif(1), i)
}
diags_matrix <- matrix(0, nrow = n, ncol = length(diags))
diags_length <- integer(length(diags))
for (i in seq_along(diags)) {
diags_length[i] <- length(diags[[i]])
diags_matrix[ ,i] <- c(diags[[i]], rep.int(0, n - diags_length[i]))
}
diags2mtr <- function(n, diags) {
mtr <- matrix(0, n, n)
for (i in seq_along(diags)) {
row <- max(1, i - n + 1)
col <- max(1, n + 1 - i)
for (j in seq_along(diags[[i]]))
mtr[row + j - 1 , col + j - 1] <- diags[[i]][j]
}
mtr
}
system.time(mtr <- diags2mtr(n, diags))
system.time(mtrCpp <- diags2mtrCpp(n, diags))
system.time(mtrOmp <- diags2mtrOmp(diags_matrix, diags_length))
all.equal(mtr, mtrCpp)
all.equal(mtr, mtrOmp)
*/
Benchmarking these solutions on a dual core machine give me:
Unit: milliseconds
expr min lq mean median uq max neval
diags2mtr 2252.82538 2271.7221 2354.1251 2323.8221 2382.7958 2558.9282 10
diags2mtrCpp 161.25920 190.9728 224.9094 226.2652 265.3675 279.3848 10
diags2mtrOmp 95.50714 100.9555 105.8462 102.4064 105.7645 127.5200 10
I am surprised about the speedup from diags2mtrOmp
.