Is `if` faster than ifelse?

2019-03-12 21:26发布

问题:

When I was re-reading Hadley's Advanced R recently, I noticed that he said in Chapter 6 that `if` can be used as a function like `if`(i == 1, print("yes"), print("no")) (If you have the physical book in hand, it's on Page 80)

We know that ifelse is slow (Does ifelse really calculate both of its vectors every time? Is it slow?) as it evaluates all arguments. Will `if` be a good alternative to that as if seems to only evaluate TRUE arguments (this is just my assumption)?


Update: Based on the answers from @Benjamin and @Roman and the comments from @Gregor and many others, ifelse seems to be a better solution for vectorized calculations. I'm taking @Benjamin's answer here as it provides a more comprehensive comparison and for the community wellness. However, both answers(and the comments) are worth reading.

回答1:

This is more of an extended comment building on Roman's answer, but I need the code utilities to expound:

Roman is correct that if is faster than ifelse, but I am under the impression that the speed boost of if isn't particularly interesting since it isn't something that can easily be harnessed through vectorization. That is to say, if is only advantageous over ifelse when the cond/test argument is of length 1.

Consider the following function which is an admittedly weak attempt at vectorizing if without having the side effect of evaluating both the yes and no conditions as ifelse does.

ifelse2 <- function(test, yes, no){
 result <- rep(NA, length(test))
 for (i in seq_along(test)){
   result[i] <- `if`(test[i], yes[i], no[i])
 }
 result
}

ifelse2a <- function(test, yes, no){
  sapply(seq_along(test),
         function(i) `if`(test[i], yes[i], no[i]))
}

ifelse3 <- function(test, yes, no){
  result <- rep(NA, length(test))
  logic <- test
  result[logic] <- yes[logic]
  result[!logic] <- no[!logic]
  result
}


set.seed(pi)
x <- rnorm(1000)

library(microbenchmark)
microbenchmark(
  standard = ifelse(x < 0, x^2, x),
  modified = ifelse2(x < 0, x^2, x),
  modified_apply = ifelse2a(x < 0, x^2, x),
  third = ifelse3(x < 0, x^2, x),
  fourth = c(x, x^2)[1L + ( x < 0 )],
  fourth_modified = c(x, x^2)[seq_along(x) + length(x) * (x < 0)]
)

Unit: microseconds
            expr     min      lq      mean  median       uq      max neval cld
        standard  52.198  56.011  97.54633  58.357  68.7675 1707.291   100 ab 
        modified  91.787  93.254 131.34023  94.133  98.3850 3601.967   100  b 
  modified_apply 645.146 653.797 718.20309 661.568 676.0840 3703.138   100   c
           third  20.528  22.873  76.29753  25.513  27.4190 3294.350   100 ab 
          fourth  15.249  16.129  19.10237  16.715  20.9675   43.695   100 a  
 fourth_modified  19.061  19.941  22.66834  20.528  22.4335   40.468   100 a 

SOME EDITS: Thanks to Frank and Richard Scriven for noticing my shortcomings.

As you can see, the process of breaking up the vector to be suitable to pass to if is a time consuming process and ends up being slower than just running ifelse (which is probably why no one has bothered to implement my solution).

If you're really desperate for an increase in speed, you can use the ifelse3 approach above. Or better yet, Frank's less obvious* but brilliant solution.

  • by 'less obvious' I mean, it took me two seconds to realize what he did. And per nicola's comment below, please note that this works only when yes and no have length 1, otherwise you'll want to stick with ifelse3


回答2:

if is a primitive (complied) function called through the .Primitive interface, while ifelse is R bytecode, so it seems that if will be faster. Running some quick benchmarks

> microbenchmark(`if`(TRUE, "a", "b"), ifelse(TRUE, "a", "b"))
Unit: nanoseconds
                   expr  min   lq    mean median     uq   max neval cld
 if (TRUE) "a" else "b"   46   54  372.59   60.0   68.0 30007   100  a 
 ifelse(TRUE, "a", "b") 1212 1327 1581.62 1442.5 1617.5 11743   100   b

> microbenchmark(`if`(FALSE, "a", "b"), ifelse(FALSE, "a", "b"))
Unit: nanoseconds
                    expr  min   lq    mean median   uq   max neval cld
 if (FALSE) "a" else "b"   47   55   91.64   61.5   73  2550   100  a 
 ifelse(FALSE, "a", "b") 1256 1346 1688.78 1460.0 1677 17260   100   b

It seems that if not taking into account the code that is in actual branches, if is at least 20x faster than ifelse. However, note that this doesn't account the complexity of expression being tested and possible optimizations on that.

Update: Please note that this quick benchmark represent a very simplified and somewhat biased use case of if vs ifelse (as pointed out in the comments). While it is correct, it underrepresents the ifelse use cases, for that Benjamin's answer seems to provided more fair comparison.