Group variables by clusters on heatmap in R

2019-07-24 03:50发布

问题:

I am trying to reproduce the first figure of this paper on graph clustering:

Here is a sample of my adjacency matrix:

data=cbind(c(48,0,0,0,0,1,3,0,1,0),c(0,75,0,0,3,2,1,0,0,1),c(0,0,34,1,16,0,3,0,1,1),c(0,0,1,58,0,1,3,1,0,0),c(0,3,16,0,181,6,6,0,2,2),c(1,2,0,1,6,56,2,1,0,1),c(3,1,3,3,6,2,129,0,0,1),c(0,0,0,1,0,1,0,13,0,1),c(1,0,1,0,2,0,0,0,70,0),c(0,1,1,0,2,1,1,1,0,85))
colnames(data)=letters[1:nrow(data)]
rownames(data)=colnames(data)

And with these commands I obtain the following heatmap:

library(reshape)
library(ggplot2)
data.m=melt(data)
data.m[,"rescale"]=round(rescale(data.m[,"value"]),3)
p=ggplot(data.m,aes(X1, X2))+geom_tile(aes(fill=rescale),colour="white") 
p=p+scale_fill_gradient(low="white",high="black")
p+theme(text=element_text(size=10),axis.text.x=element_text(angle=90,vjust=0)) 

This is very similar to the plot on the left of Figure 1 above. The only differences are that (1) the nodes are not ordered randomly but alphabetically, and (2) instead of just having binary black/white pixels, I am using a "shades of grey" palette to be able to show the strength of the co-occurrence between nodes.

But the point is that it is very hard to distinguish any cluster structure (and this would be even more true with the full set of 100 nodes). So, I want to order my vertices by clusters on the heatmap. I have this membership vector from a community detection algorithm:

membership=c(1,2,4,2,5,3,1,2,2,3)

Now, how can I obtain a heatmap similar to the plot on the right of Figure 1 above?

Thanks a lot in advance for any help

PS: I have experimented R draw kmeans clustering with heatmap and R: How do I display clustered matrix heatmap (similar color patterns are grouped) but could not get what I want.

回答1:

Turned out this was extremely easy. I am still posting the solution so others in my case don't waste time on that like I did.

The first part is exactly the same as before:

data.m=melt(data)
data.m[,"rescale"]=round(rescale(data.m[,"value"]),3)

Now, the trick is that the levels of the factors of the melted data.frame have to be ordered by membership:

data.m[,"X1"]=factor(data.m[,"X1"],levels=levels(data.m[,"X1"])[order(membership)])
data.m[,"X2"]=factor(data.m[,"X2"],levels=levels(data.m[,"X2"])[order(membership)])

Then, plot the heat map (same as before):

p=ggplot(data.m,aes(X1, X2))+geom_tile(aes(fill=rescale),colour="white") 
p=p+scale_fill_gradient(low="white",high="black")
p+theme(text=element_text(size=10),axis.text.x=element_text(angle=90,vjust=0))

This time, the cluster is clearly visible.