Passing empty index in R

2019-07-21 05:22发布

问题:

Say I want to subset a vector a, I can pass the value of the indices to subset in a variable e.g. a[idx].

What value should I set idx to get the equivalent of getting the whole a ( i.e. a[] ) ?

Basically I have a function with idx as the argument, and would like to pass a value to process the whole dataset. I'm assuming there should be something better than 1:length(a).

回答1:

The index argument in subsetting is allowed to be "missing" (see ?"["):

ff1 = function(x, i) x[i] 
ff2 = function(x, i = TRUE) x[i] 
ff3 = function(x, i = seq_along(x)) x[i]
ff4 = function(x, i = substitute()) x[i]

a = sample(10)
a
# [1]  3  8  2  6  9  7  5  1  4 10
ff1(a)
# [1]  3  8  2  6  9  7  5  1  4 10
ff2(a)
# [1]  3  8  2  6  9  7  5  1  4 10
ff3(a)
# [1]  3  8  2  6  9  7  5  1  4 10
ff4(a)
# [1]  3  8  2  6  9  7  5  1  4 10


a = runif(1e6)
identical(ff1(a), ff2(a))
#[1] TRUE
identical(ff1(a), ff3(a))
#[1] TRUE
identical(ff1(a), ff4(a))
#[1] TRUE
microbenchmark::microbenchmark(ff1(a), ff2(a), ff3(a), ff4(a), times = 25)
#Unit: milliseconds
#   expr       min        lq    median        uq       max neval
# ff1(a)  2.026772  2.131173  2.207037  2.930885  3.789409    25
# ff2(a) 12.091727 12.151931 12.318625 12.740057 16.829445    25
# ff3(a)  8.930464  9.104118  9.454557  9.643175 13.131213    25
# ff4(a)  2.024684  2.090108  2.156577  2.289166  3.496391    25


回答2:

You can use a small hack: Setting idx to TRUE

a[TRUE]



回答3:

The answer by @ahmohamed is correct and a very concise approach to the problem. Just in case you are working with a large dataset, let me illustrate the performance difference of recycling a logical vector and using a numeric index:

a <- sample(1e6, 1e7, TRUE)
library(microbenchmark)
microbenchmark(a[TRUE], a[seq_along(a)])

#Unit: milliseconds
#            expr       min        lq    median       uq       max neval
#         a[TRUE] 238.10089 254.63311 261.03451 287.7352 1163.8499   100
# a[seq_along(a)]  64.49373  95.48278  98.00964 142.4811  709.2872   100


标签: r subset