When I was re-reading Hadley's Advanced R recently, I noticed that he said in Chapter 6 that `if`
can be used as a function like
`if`(i == 1, print("yes"), print("no"))
(If you have the physical book in hand, it's on Page 80)
We know that ifelse
is slow (Does ifelse really calculate both of its vectors every time? Is it slow?) as it evaluates all arguments. Will `if`
be a good alternative to that as if
seems to only evaluate TRUE
arguments (this is just my assumption)?
Update: Based on the answers from @Benjamin and @Roman and the comments from @Gregor and many others, ifelse
seems to be a better solution for vectorized calculations. I'm taking @Benjamin's answer here as it provides a more comprehensive comparison and for the community wellness. However, both answers(and the comments) are worth reading.
This is more of an extended comment building on Roman's answer, but I need the code utilities to expound:
Roman is correct that if
is faster than ifelse
, but I am under the impression that the speed boost of if
isn't particularly interesting since it isn't something that can easily be harnessed through vectorization. That is to say, if
is only advantageous over ifelse
when the cond
/test
argument is of length 1.
Consider the following function which is an admittedly weak attempt at vectorizing if
without having the side effect of evaluating both the yes
and no
conditions as ifelse
does.
ifelse2 <- function(test, yes, no){
result <- rep(NA, length(test))
for (i in seq_along(test)){
result[i] <- `if`(test[i], yes[i], no[i])
}
result
}
ifelse2a <- function(test, yes, no){
sapply(seq_along(test),
function(i) `if`(test[i], yes[i], no[i]))
}
ifelse3 <- function(test, yes, no){
result <- rep(NA, length(test))
logic <- test
result[logic] <- yes[logic]
result[!logic] <- no[!logic]
result
}
set.seed(pi)
x <- rnorm(1000)
library(microbenchmark)
microbenchmark(
standard = ifelse(x < 0, x^2, x),
modified = ifelse2(x < 0, x^2, x),
modified_apply = ifelse2a(x < 0, x^2, x),
third = ifelse3(x < 0, x^2, x),
fourth = c(x, x^2)[1L + ( x < 0 )],
fourth_modified = c(x, x^2)[seq_along(x) + length(x) * (x < 0)]
)
Unit: microseconds
expr min lq mean median uq max neval cld
standard 52.198 56.011 97.54633 58.357 68.7675 1707.291 100 ab
modified 91.787 93.254 131.34023 94.133 98.3850 3601.967 100 b
modified_apply 645.146 653.797 718.20309 661.568 676.0840 3703.138 100 c
third 20.528 22.873 76.29753 25.513 27.4190 3294.350 100 ab
fourth 15.249 16.129 19.10237 16.715 20.9675 43.695 100 a
fourth_modified 19.061 19.941 22.66834 20.528 22.4335 40.468 100 a
SOME EDITS: Thanks to Frank and Richard Scriven for noticing my shortcomings.
As you can see, the process of breaking up the vector to be suitable to pass to if
is a time consuming process and ends up being slower than just running ifelse
(which is probably why no one has bothered to implement my solution).
If you're really desperate for an increase in speed, you can use the ifelse3
approach above. Or better yet, Frank's less obvious* but brilliant solution.
- by 'less obvious' I mean, it took me two seconds to realize what he did. And per nicola's comment below, please note that this works only when
yes
and no
have length 1, otherwise you'll want to stick with ifelse3
if
is a primitive (complied) function called through the .Primitive
interface, while ifelse
is R bytecode, so it seems that if
will be faster. Running some quick benchmarks
> microbenchmark(`if`(TRUE, "a", "b"), ifelse(TRUE, "a", "b"))
Unit: nanoseconds
expr min lq mean median uq max neval cld
if (TRUE) "a" else "b" 46 54 372.59 60.0 68.0 30007 100 a
ifelse(TRUE, "a", "b") 1212 1327 1581.62 1442.5 1617.5 11743 100 b
> microbenchmark(`if`(FALSE, "a", "b"), ifelse(FALSE, "a", "b"))
Unit: nanoseconds
expr min lq mean median uq max neval cld
if (FALSE) "a" else "b" 47 55 91.64 61.5 73 2550 100 a
ifelse(FALSE, "a", "b") 1256 1346 1688.78 1460.0 1677 17260 100 b
It seems that if not taking into account the code that is in actual branches, if
is at least 20x faster than ifelse
. However, note that this doesn't account the complexity of expression being tested and possible optimizations on that.
Update: Please note that this quick benchmark represent a very simplified and somewhat biased use case of if
vs ifelse
(as pointed out in the comments). While it is correct, it underrepresents the ifelse
use cases, for that Benjamin's answer seems to provided more fair comparison.