How do I manipulate/access elements of an instance

2020-05-20 09:18发布

A basic/common class in R is called "dist", and is a relatively efficient representation of a symmetric distance matrix. Unlike a "matrix" object, however, there does not seem to be support for manipulating an "dist" instance by index pairs using the "[" operator.

For example, the following code returns nothing, NULL, or an error:

# First, create an example dist object from a matrix
mat1  <- matrix(1:100, 10, 10)
rownames(mat1) <- 1:10
colnames(mat1) <- 1:10
dist1 <- as.dist(mat1)
# Now try to access index features, or index values
names(dist1)
rownames(dist1)
row.names(dist1)
colnames(dist1)
col.names(dist1)
dist1[1, 2]

Meanwhile, the following commands do work, in some sense, but do not make it any easier to access/manipulate particular index-pair values:

dist1[1] # R thinks of it as a vector, not a matrix?
attributes(dist1)
attributes(dist1)$Diag <- FALSE
mat2 <- as(dist1, "matrix")
mat2[1, 2] <- 0

A workaround -- that I want to avoid -- is to first convert the "dist" object to a "matrix", manipulate that matrix, and then convert it back to "dist". That is also to say, this is not a question about how to convert a "dist" instance into a "matrix", or some other class where common matrix-indexing tools are already defined; since this has been answered in several ways in a different SO question

Are there tools in the stats package (or perhaps some other core R package) dedicated indexing/accessing elements of an instance of "dist"?

12条回答
做个烂人
2楼-- · 2020-05-20 09:19

Seems dist objects are treated pretty much the same way as simple vector objects. As far as I can see its a vector with attributes. So to get the values out:

x = as.vector(distobject)

See? dist for a formula to extract the distance between a specific pair of objects using their indices.

查看更多
仙女界的扛把子
3楼-- · 2020-05-20 09:19

Converting to a matrix was also out of question for me, because the resulting matrix would be 35K by 35K, so I left it as a vector (result of dist) and wrote a function to find the place in the vector where the distance should be:

distXY <- function(X,Y,n){
  A=min(X,Y)
  B=max(X,Y)

  d=eval(parse(text=
               paste0("(A-1)*n  -",paste0((1:(A-1)),collapse="-"),"+ B-A")))

  return(d)

}

Where you provide X and Y, the original rows of the elements in the matrix from which you calculated dist, and n is the total number of elements in that matrix. The result is the position in the dist vector where the distance will be. I hope it makes sense.

查看更多
【Aperson】
4楼-- · 2020-05-20 09:26

as.matrix(d) will turn the dist object d into a matrix, while as.dist(m) will turn the matrix m back into a dist object. Note that the latter doesn't actually check that m is a valid distance matrix; it just extracts the lower triangular part.

查看更多
Bombasti
5楼-- · 2020-05-20 09:26

You may find this useful [from ??dist]:

The lower triangle of the distance matrix stored by columns in a vector, say ‘do’. If ‘n’ is the number of observations, i.e., ‘n <- attr(do, "Size")’, then for i < j <= n, the dissimilarity between (row) i and j is ‘do[n*(i-1) - i*(i-1)/2 + j-i]’. The length of the vector is n*(n-1)/2, i.e., of order n^2.

查看更多
一纸荒年 Trace。
6楼-- · 2020-05-20 09:28

You could do this:

d <- function(distance, selection){
  eval(parse(text = paste("as.matrix(distance)[",
               selection, "]")))
}

`d<-` <- function(distance, selection, value){
  eval(parse(text = paste("as.matrix(distance)[",
               selection, "] <- value")))
  as.dist(distance)
}

Which would allow you to do this:

 mat <- matrix(1:12, nrow=4)
 mat.d <- dist(mat)
 mat.d
        1   2   3
    2 1.7        
    3 3.5 1.7    
    4 5.2 3.5 1.7

 d(mat.d, "3, 2")
    [1] 1.7
 d(mat.d, "3, 2") <- 200
 mat.d
          1     2     3
    2   1.7            
    3   3.5 200.0      
    4   5.2   3.5   1.7

However, any changes you make to the diagonal or upper triangle are ignored. That may or may not be the right thing to do. If it isn't, you'll need to add some kind of sanity check or appropriate handling for those cases. And probably others.

查看更多
迷人小祖宗
7楼-- · 2020-05-20 09:31

This response is really just an extended follow on to Christian A's earlier response. It is warranted because some readers of the question (myself included) may query the dist object as if it were symmetric ( not just (7,13) as below but also (13,7). I don't have edit privileges and the earlier answer was correct as long as the user was treating the dist object as a dist object and not a sparse matrix which is why I have a separate response rather than an edit. Vote up Christian A for doing the heavy lifting if this answer is useful. The original answer with my edits pasted in :

distdex<-function(i,j,n) #given row, column, and n, return index
    n*(i-1) - i*(i-1)/2 + j-i

rowcol<-function(ix,n) { #given index, return row and column
    nr=ceiling(n-(1+sqrt(1+4*(n^2-n-2*ix)))/2)
    nc=n-(2*n-nr+1)*nr/2+ix+nr
    cbind(nr,nc)
}
#A little test harness to show it works:

dist(rnorm(20))->testd
as.matrix(testd)[7,13]   #row<col
distdex(7,13,20) # =105
testd[105]   #same as above

But...

distdex(13,7,20) # =156
testd[156]   #the wrong answer

Christian A's function only works if i < j. For i = j and i > j it returns the wrong answer. Modifying the distdex function to return 0 when i == j and to transpose i and j when i > j solves the problem so:

distdex2<-function(i,j,n){ #given row, column, and n, return index
  if(i==j){0
  }else if(i > j){
    n*(j-1) - j*(j-1)/2 + i-j
  }else{
    n*(i-1) - i*(i-1)/2 + j-i  
  }
}

as.matrix(testd)[7,13]   #row<col
distdex2(7,13,20) # =105
testd[105]   #same as above
distdex2(13,7,20) # =105
testd[105]   #the same answer
查看更多
登录 后发表回答