Assignment to subset of a matrix with repeated ind

2019-06-22 09:52发布

问题:

Not sure this qualifies for an entry in the R-Inferno, but can someone comment on the logic behind the way the following replacement works?

foo<-matrix(1:6,2)
bar<-foo[2,c(1,3,1)]
bar
# [1] 2 6 2
foo[2,c(1,3,1)]<-foo[2,c(1,3,1)]+5
foo
#      [,1] [,2] [,3]
# [1,]    1    3    5
# [2,]    7    4   11

My question is: when generating bar, the repeated coordinate results in a repeated element in the output, but when modifying foo, the repeated coordinate does not result in a repeated addition operation. (By comparison, for(j in c(1,3,1) ) foo[2,j]<-foo[2,j]+5 does). Why & how exactly does [<- essentially ignore the repeated index?

回答1:

From help("[<-"):

Subassignment is done sequentially, so if an index is specified more than once the latest assigned value for an index will result.

foo<-matrix(1:6,2)

foo[1,rep(1,2)] <- c(1,42)

#     [,1] [,2] [,3]
#[1,]   42    3    5
#[2,]    2    4    6


回答2:

To try to answer the secondary question in the comments indirectly:

> vec <- 1:10
> microbenchmark(
+       rep(1, 1e4),
+       vec[rep(1, 1e4)] <- 1:1e4,
+       vec[1] <- 1e4
+     )
Unit: microseconds
                          expr     min       lq   median       uq      max neval
                 rep(1, 10000)  16.457  17.9190  18.2860  19.0170 2561.327   100
 vec[rep(1, 10000)] <- 1:10000 215.395 219.7835 227.8285 233.6795 3437.532   100
               vec[1] <- 10000   1.463   2.1950   3.2920   3.8405   22.308   100

Strongly suggests that the same values are assigned to the same memory location over and over until only the last one prevails. Why they are not added is just because the operation here is overwriting, not adding (though maybe that was not what you were asking with "does not result in a repeated addition operation").

Note that your loop and your direct assignment are not equivalent since in your loop you are reading, adding, assigning, re-reading, re-adding, re-assigning, etc., whereas in your direct assignment you are reading once, adding to the single vector once, and then only preserving the last value through over-writing.

The key difference between the "reading" is that the expected "output" is a vector length of the index vector, whereas the length of the "writing" (excluding the case where you are using out of bounds indices) vector is limited by the vector you're writing to.



标签: r subset