The title is self explaining. I would like to know why R has chosen to recycle boolean values for selection/subsetting?
The documentation for "["
states Such vectors are recycled if necessary to match the corresponding extent. i, j
Are there any advantages of doing this? I could think of one as mentioned below, but I'd think the disadvantages might outweigh the benefits of ease of use.
df<- data.frame(C1=1:10,c2=101:110)
class(unclass(df)[1]) # df is a list of two lists, each a column of df
df
df[1] # selects 1st list (ie, first column)
df[2]
# However, indices are recycled if we use Logical indices
df[TRUE] # selects both columns
df[c(T,T),] # recycled row indices
df[c(T,T,F),] # recycled row indices
df[FALSE]
# This can actually lead to inadvertent errors
# For example, this has only 7 index elements instead of 10,
# but it's quite possible to miss out on the fact that these are being recycled
df[c(T,F,T,T,F,F,F),]
The only use of this recycling feature that I could think of was in skipping alternate rows
df[c(T,F),]
The context for asking this question is another one I saw on SO yesterday. It was later deleted as someone had pointed out the difference e between |
and ||
. I wonder if they realised they were also dealing with recycling here.
# An erronous use of && can land you in soup too
df [df$C1 >0 && df$c2 <102, ] #returns TRUE, will select all rows
Are there any other well known pitfalls of this nature that one should be wary of?