Normalize between-blocks by the maxima of the bloc

2019-09-15 05:43发布

问题:

This question is a follow-up on my previous question Instead of normalizing blocks by the local maxima, I would like to normalize between-blocks by the maxima of the block corresponding to the columns.

#dummy data
mat <- matrix(round(runif(90, 0, 50),),9,9)
rownames(mat) <- rep(LETTERS[1:3],3)
colnames(mat) <- rep(LETTERS[1:3],3)

#Normalizes within and between blocks by the maxima of the focal block
ans <- mat / ave(mat, rownames(mat)[row(mat)], colnames(mat)[col(mat)], FUN = max)

#Number of blocks
sum(ans == 1)
#[1] 9

I would like to normalize between-blocks, i.e.,AB, AC, BA, BC, CA, CB by the maxima of the block corresponding to the columns. E.g., in the case of AB normalize it by the max() in BB, and AC by the max() in CC etc.

> mat[rownames(mat)=="A",colnames(mat)=="B"]
  B  B  B
A 26 26 14
A 12 11 18
A 44 44 29

> mat[rownames(mat)=="B",colnames(mat)=="B"]
  B  B  B
B  9 23 20
B 28 45 28
B 14 12 45

In this case, normalizing the between-block AB not by the maxima of this block (i.e. 44), but by the maxima of block BB (i.e. 45).

Any pointers are highly appreciated!

回答1:

Let cn be the vector formed by first replacing each element of mat with its column name and then unravelling the resulting matrix column by column. Similarly do the same thing with the rows giving rn.

(cn == rn) * mat is the same as mat except all non-diagonal blocks are zeroed out.

v is a vector whose names are the unique column names and whose values are the maxima of the corresponding diagonal blocks. The construction of v depnds on the fact that the maxima are 0 or more.

replace(mat, TRUE, v[cn]) is the matrix formed by replacing each element of mat with the maximum of the diagonal block in its column and finally we divide mat by that.

Note that if any diagonal block is all zeros then the column will be all NaNs; however, there should not be any problem if any off-diagonal blocks are all zeros.

cn <- colnames(mat)[col(mat)]
rn <- rownames(mat)[row(mat)]
v <- tapply((cn == rn) * mat,  cn, max)
mat / replace(mat, TRUE, v[cn])