-->

base::chol() slows down when matrix contains many

2020-03-26 06:04发布

问题:

I've noticed that base::chol() severely slows down when the matrix contains many small elements. Here is an example:

## disable openMP
library(RhpcBLASctl); blas_set_num_threads(1); omp_set_num_threads(1)
  • Baseline: create positive definite matrix and get timing for chol().

    loc <- expand.grid(1:60, 1:50)
    covmat1 <- exp(-as.matrix(dist(loc)))
    mean(c(covmat1))
    # [1] 0.002076862
    system.time(chol1 <- chol(covmat1))
    #   user  system elapsed 
    #  0.313   0.024   0.337 
    
  • Increase small values: create covmat2 matrix with more small values.

    covmat2 <- exp(-as.matrix(dist(loc))*10)
    mean(c(covmat2))
    # [1] 0.0003333937
    system.time(chol2 <- chol(covmat2))
    #   user  system elapsed 
    #  2.311   0.021   2.333 
    

    Compared to the base line this slows down the computation by almost factor 10.

  • Set small values to zero: set values of covmat2 that are smaller than 1e-13 to zero.

    covmat3 <- covmat2
    covmat3[covmat3 < 1e-13] <- 0
    mean(c(covmat3))
    # [1] 0.0003333937
    system.time(chol3 <- chol(covmat3))
    #   user  system elapsed 
    #  0.302   0.016   0.318 
    

    This version is again faster and similar to the base line.

Why does this slowdown happen?


Notes:

Repeated evaluations of the timing experiments lead to similar results.

I know that for matrices with many values close to zero it might be more efficient to use a sparse matrix approach, e.g., the R package spam.

sessionInfo()
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Linux Mint 19.2

## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so