fastest way to get Min from every column in a matr

2019-03-25 06:15发布

What is the fastest way to extract the min from each column in a matrix?


EDIT:

Moved all the benchmarks to the answer below.

Using a Tall, Short or Wide Matrix:

  ##  TEST DATA
  set.seed(1)
  matrix.inputs <- list(
        "Square Matrix"     = matrix(sample(seq(1e6), 4^2*1e4, T), ncol=400),   #  400 x  400
        "Tall Matrix"       = matrix(sample(seq(1e6), 4^2*1e4, T), nrow=4000),  # 4000 x   40
        "Wide-short Matrix" = matrix(sample(seq(1e6), 4^2*1e4, T), ncol=4000),  #   40 x 4000
        "Wide-tall Matrix"  = matrix(sample(seq(1e6), 4^2*1e5, T), ncol=4000),   #  400 x 4000
        "Tiny Sq Matrix"    = matrix(sample(seq(1e6), 4^2*1e2, T), ncol=40)     #   40 x   40
  )

6条回答
放荡不羁爱自由
2楼-- · 2019-03-25 06:32

Update 2014-12-17:

colMins() et al. were made significantly faster in a recent version of matrixStats. Here's an updated benchmark summary using matrixStats 0.12.2 showing that it ("cmin") is ~5-20 times faster than the second fastest approach:

$`Square Matrix`
     test elapsed relative
2    cmin   0.216    1.000
1     apl   4.200   19.444
5 pmn.int   4.604   21.315
4     pmn   5.136   23.778
3    lapl  12.546   58.083

$`Tall Matrix`
     test elapsed relative
2    cmin   0.262    1.000
1     apl   3.006   11.473
5 pmn.int  18.605   71.011
3    lapl  22.798   87.015
4     pmn  27.583  105.279

$`Wide-short Matrix`
     test elapsed relative
2    cmin   0.346    1.000
5 pmn.int   3.766   10.884
4     pmn   3.955   11.431
3    lapl  13.393   38.708
1     apl  19.187   55.454

$`Wide-tall Matrix`
     test elapsed relative
2    cmin   5.591    1.000
5 pmn.int  39.466    7.059
4     pmn  40.265    7.202
1     apl  67.151   12.011
3    lapl 158.035   28.266

$`Tiny Sq Matrix`
     test elapsed relative
2    cmin   0.011    1.000
5 pmn.int   0.135   12.273
4     pmn   0.178   16.182
1     apl   0.202   18.364
3    lapl   0.269   24.455

Previous comment 2013-10-09:
FYI, since matrixStats v0.8.7 (2013-07-28), colMins() is roughly twice as fast as before. The reason is that the function previously utilized colRanges(), which explains the "surprisingly slow" results reported here. Same speed is seen for colMaxs(), rowMins() and rowMaxs().

查看更多
贪生不怕死
3楼-- · 2019-03-25 06:32

mat[(1:ncol(mat)-1)*nrow(mat)+max.col(t(-mat))] seems pretty fast, and it's base R.

查看更多
趁早两清
4楼-- · 2019-03-25 06:48

The sos package is great for answering these sorts of questions.

library("sos")
findFn("colMins")
library("matrixStats")
?colMins

http://finzi.psych.upenn.edu/R/library/matrixStats/html/rowRanges.html

Oddly enough, for the one example I tried colMins was slower. Perhaps someone can point out what's funny about my example?

set.seed(101); z <- matrix(runif(1e6),nrow=1000)
library(rbenchmark)
benchmark(colMins(z),apply(z,2,min))
##               test replications elapsed relative user.self sys.self
## 2 apply(z, 2, min)          100  14.290     1.00     7.216    7.057
## 1       colMins(z)          100  25.585     1.79    15.509    9.852
查看更多
兄弟一词,经得起流年.
5楼-- · 2019-03-25 06:48

Below is a collection of the answers thus far. This will be updated as more answers are contributed.

BENCHMARKS

  library(rbenchmark)
  library(matrixStats)  # for colMins


  list.of.tests <- list (
        ## Method 1: apply()  [original]
        apl =expression(apply(mat, 2, min, na.rm=T)),

        ## Method 2:  matrixStats::colMins [contributed by @Ben Bolker ]
        cmin = expression(colMins(mat)),

        ## Method 3: lapply() + split()  [contributed by @DWin ]
        lapl = expression(lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)),

        ## Method 4: pmin() / pmin.int()  [contributed by @flodel ]
        pmn = expression(do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))),
        pmn.int = expression(do.call(pmin.int, lapply(1:nrow(mat), function(i)mat[i,]))) #,

        ## Method 5: ????
        #  e5 = expression(  ???  ),
        )  


  (times <- 
        lapply(matrix.inputs, function(mat)
            do.call(benchmark, args=c(list.of.tests, replications=500, order="relative"))[, c("test", "elapsed", "relative")]
  ))



  ############################# 
  #$         RESULTS         $#
  #$_________________________$#
  #############################

  # $`Square Matrix`
  #      test elapsed relative
  # 5 pmn.int   2.842    1.000
  # 4     pmn   3.622    1.274
  # 1     apl   3.670    1.291
  # 2    cmin   5.826    2.050
  # 3    lapl  41.817   14.714  

  # $`Tall Matrix`
  #      test elapsed relative
  # 1     apl   2.622    1.000
  # 2    cmin   5.561    2.121
  # 5 pmn.int  11.264    4.296
  # 4     pmn  18.142    6.919
  # 3    lapl  48.637   18.550  

  # $`Wide-short Matrix`
  #      test elapsed relative
  # 5 pmn.int   2.909    1.000
  # 4     pmn   3.018    1.037
  # 2    cmin   6.361    2.187
  # 1     apl  15.765    5.419
  # 3    lapl  41.479   14.259  

  # $`Wide-tall Matrix`
  #      test elapsed relative
  # 5 pmn.int  20.917    1.000
  # 4     pmn  26.188    1.252
  # 1     apl  38.635    1.847
  # 2    cmin  64.557    3.086
  # 3    lapl 434.761   20.785  

  # $`Tiny Sq Matrix`
  #      test elapsed relative
  # 5 pmn.int   0.112    1.000
  # 2    cmin   0.149    1.330
  # 4     pmn   0.174    1.554
  # 1     apl   0.180    1.607
  # 3    lapl   0.509    4.545
查看更多
来,给爷笑一个
6楼-- · 2019-03-25 06:55

Here is one that is faster on square and wide matrices. It uses pmin on the rows of the matrix. (If you know a faster way of splitting the matrix into its rows, please feel free to edit)

do.call(pmin, lapply(1:nrow(mat), function(i)mat[i,]))

Using the same benchmark as @RicardoSaporta:

$`Square Matrix`
          test elapsed relative
3 pmin.on.rows   1.370    1.000
1          apl   1.455    1.062
2         cmin   2.075    1.515

$`Wide Matrix`
      test elapsed relative
3 pmin.on.rows   0.926    1.000
2         cmin   2.302    2.486
1          apl   5.058    5.462

$`Tall Matrix`
          test elapsed relative
1          apl   1.175    1.000
2         cmin   2.126    1.809
3 pmin.on.rows   5.813    4.947
查看更多
SAY GOODBYE
7楼-- · 2019-03-25 06:55
lapply( split(mat, rep(1:dim(mat)[1], each=dim(mat)[2])), min)

which( ! apply(my.mat, 2, min, na.rm=T) ==
        sapply( split(my.mat, rep(1:dim(my.mat)[1], each=dim(my.mat)[2])), min) )
# named integer(0)
查看更多
登录 后发表回答