Say I want to subset a vector a
, I can pass the value of the indices to subset in a variable e.g. a[idx]
.
What value should I set idx
to get the equivalent of getting the whole a
( i.e. a[]
) ?
Basically I have a function with idx
as the argument, and would like to pass a value to process the whole dataset. I'm assuming there should be something better than 1:length(a)
.
The index argument in subsetting is allowed to be "missing" (see ?"["
):
ff1 = function(x, i) x[i]
ff2 = function(x, i = TRUE) x[i]
ff3 = function(x, i = seq_along(x)) x[i]
ff4 = function(x, i = substitute()) x[i]
a = sample(10)
a
# [1] 3 8 2 6 9 7 5 1 4 10
ff1(a)
# [1] 3 8 2 6 9 7 5 1 4 10
ff2(a)
# [1] 3 8 2 6 9 7 5 1 4 10
ff3(a)
# [1] 3 8 2 6 9 7 5 1 4 10
ff4(a)
# [1] 3 8 2 6 9 7 5 1 4 10
a = runif(1e6)
identical(ff1(a), ff2(a))
#[1] TRUE
identical(ff1(a), ff3(a))
#[1] TRUE
identical(ff1(a), ff4(a))
#[1] TRUE
microbenchmark::microbenchmark(ff1(a), ff2(a), ff3(a), ff4(a), times = 25)
#Unit: milliseconds
# expr min lq median uq max neval
# ff1(a) 2.026772 2.131173 2.207037 2.930885 3.789409 25
# ff2(a) 12.091727 12.151931 12.318625 12.740057 16.829445 25
# ff3(a) 8.930464 9.104118 9.454557 9.643175 13.131213 25
# ff4(a) 2.024684 2.090108 2.156577 2.289166 3.496391 25
You can use a small hack: Setting idx
to TRUE
a[TRUE]
The answer by @ahmohamed is correct and a very concise approach to the problem. Just in case you are working with a large dataset, let me illustrate the performance difference of recycling a logical vector and using a numeric index:
a <- sample(1e6, 1e7, TRUE)
library(microbenchmark)
microbenchmark(a[TRUE], a[seq_along(a)])
#Unit: milliseconds
# expr min lq median uq max neval
# a[TRUE] 238.10089 254.63311 261.03451 287.7352 1163.8499 100
# a[seq_along(a)] 64.49373 95.48278 98.00964 142.4811 709.2872 100