R- rationale for recycling boolean indices for sel

2019-09-16 02:08发布


The title is self explaining. I would like to know why R has chosen to recycle boolean values for selection/subsetting? The documentation for "[" states Such vectors are recycled if necessary to match the corresponding extent. i, j

Are there any advantages of doing this? I could think of one as mentioned below, but I'd think the disadvantages might outweigh the benefits of ease of use.

df<- data.frame(C1=1:10,c2=101:110) 
class(unclass(df)[1]) # df is a list of two lists, each a column of df
df[1] # selects 1st list (ie, first column)

# However, indices are recycled if we use Logical indices
df[TRUE] # selects both columns
df[c(T,T),] # recycled row indices
df[c(T,T,F),] # recycled row indices

# This can actually lead to inadvertent errors
# For example, this has only 7 index elements instead of 10, 
# but it's quite possible to miss out on the fact that these are being recycled

The only use of this recycling feature that I could think of was in skipping alternate rows


The context for asking this question is another one I saw on SO yesterday. It was later deleted as someone had pointed out the difference e between | and ||. I wonder if they realised they were also dealing with recycling here.

   # An erronous use of &&  can land you in soup too
   df [df$C1 >0 && df$c2 <102, ] #returns TRUE, will select all rows

Are there any other well known pitfalls of this nature that one should be wary of?



Lets you select every nth row or column in a vector or data.frame or matrix:

> m <- matrix(1:20, 4)
> m[c(TRUE,FALSE), ]
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    3    7   11   15   19
> m[, c(TRUE,FALSE) ]
     [,1] [,2] [,3]
[1,]    1    9   17
[2,]    2   10   18
[3,]    3   11   19
[4,]    4   12   20

Every third column:

     [,1] [,2]
[1,]    1   13
[2,]    2   14
[3,]    3   15
[4,]    4   16

The cited disadvantage is really an incorrect use of the && operator (which I think you do actually realize). That operator only ever returns a length-1 vector and is generally inappropriate when trying to do indexing. That was probably the confusion exhibited by the questioner who used the || operator.

Ultimately the answer is because the authors liked it that way. R is a clone in most semantics of S and it was built around the dawn of high level languages in the AT&T think-tank.