subsetting matrix with id from another matrix

2019-03-01 13:12发布

问题:

I would like to subset the data of one matrix using data in a second matrix. The columns of one matrix is labeled. For example,

area1 <- c(9836374,635440,23018,833696,936079,1472449,879042,220539,870581,217418,552303,269359,833696,936079,1472449,879042,220539,870581, 833696,936079,1472449,879042,220539,870581)
id <- c(1,2,5,30,31,34,1,2,5,1,2,5,1,2,5,30,31,34,51,52,55,81,82,85)
mat1 <- matrix(area1, ncol=3, byrow=T)
mat2 <- matrix(id, ncol=3, byrow=T)
dimnames(mat1) <-list(NULL, c("a1","a2","a3"))   

mat2 contains the ids for mat1, so the dimensions of the matrix are the same (i.e., mat1[1,1] identifies mat2[1,1]. What I want is to create submatrices of mat1 when the row with values c(1, 2, 5) shows up in mat2. In this present mini example, submatrix 1 would have 2 rows of data, submatrix 2 and 3 have 1 row each, and submatrix 4 would have 4 rows of data from mat1. The number of rows between subsequent rows with 1,3,5 varies. Does this make sense?

Originally, the matrices were transformed from a dataframe, with id in one column and area in a second column. I couldn't find a way to subset variable rows between rows of 1 within a dataframe, which is why I switched to a matrix.

回答1:

I think this covers it and fits with your description:

spl <- cumsum(apply(mat2,1, function(x) all(x==c(1,2,5))))
split(as.data.frame(mat1),spl)

#$`1`
#       a1     a2      a3
#1 9836374 635440   23018
#2  833696 936079 1472449
# 
#$`2`
#      a1     a2     a3
#3 879042 220539 870581
#
#$`3`
#      a1     a2     a3
#4 217418 552303 269359
#
#$`4`
#      a1     a2      a3
#5 833696 936079 1472449
#6 879042 220539  870581
#7 833696 936079 1472449
#8 879042 220539  870581

The result fits with "submatrix 1 would have 2 rows of data, submatrix 2 and 3 have 1 row each, and submatrix 4 would have 4 rows of data from mat1"



回答2:

split(as.data.frame(mat1), apply(mat2, 1, paste, collapse = " "))
#$`1 2 5`
#       a1     a2      a3
#1 9836374 635440   23018
#3  879042 220539  870581
#4  217418 552303  269359
#5  833696 936079 1472449
#
#$`30 31 34`
#      a1     a2      a3
#2 833696 936079 1472449
#6 879042 220539  870581
#
#$`51 52 55`
#      a1     a2      a3
#7 833696 936079 1472449
#
#$`81 82 85`
#      a1     a2     a3
#8 879042 220539 870581


回答3:

mat1[which(mat2[,1]==1 & mat2[,2]==2 & mat2[,3]==5),]
        [,1]   [,2]    [,3]
[1,] 9836374 635440   23018
[2,]  879042 220539  870581
[3,]  217418 552303  269359
[4,]  833696 936079 1472449


回答4:

I think from what you said, you wanted to keep it as a data frame. You can easily make submatrices by grabbing rows with certain column values.

Here, I put the data frame back together and made a submatrix just for 1. You can easily add onto it by doing something like using cbind on multiple "area1" columns.

> area1 <- c(9836374,635440,23018,833696,936079,1472449,879042,220539,870581,217418,552303,269359,833696,936079,1472449,879042,220539,870581, 833696,936079,1472449,879042,220539,870581)
> id <- c(1,2,5,30,31,34,1,2,5,1,2,5,1,2,5,30,31,34,51,52,55,81,82,85)
> original<-as.data.frame(cbind(id,area1))
> original[original$id==1,]
   id   area1
1   1 9836374
7   1  879042
10  1  217418
13  1  833696

Then you can do what I said before like this.

> col1<-original[original$id==1,"area1"]
> col2<-original[original$id==2,"area1"]
> col3<-original[original$id==5,"area1"]
> submat<-cbind(col1,col2,col3)
> colnames(submat)<-c("a1","a2","a3")
> submat
          a1     a2      a3
[1,] 9836374 635440   23018
[2,]  879042 220539  870581
[3,]  217418 552303  269359
[4,]  833696 936079 1472449


标签: r matrix subset