selecting number of leaf nodes of dendrogram in he

2019-09-18 13:21发布

In Matlab you can designate the number of nodes in a dendrogram that you wish to plot as part of the dendrogram function: dendrogram(tree,P) generates a dendrogram plot with no more than P leaf nodes.

My attempts to do the same with heatmap2 in R have failed miserably. The posts to stackoverflow and biostars have suggested using cutree but heatmap2 gets stuck with postings' suggestions on Rowv option. Here "TAD" is the data matrix 8 columns by 831 rows.

# cluster it
hr <- hclust(dist(TAD, method="manhattan"), method="average")

# draw the heat map
heatmap.2(TAD, main="Hierarchical Cluster",
          Rowv=as.dendrogram(cutree(hr, k=5)),
          Colv=NA, dendrogram="row", col=my_palette, density.info="none", trace="none")

returns the message:

Error in UseMethod("as.dendrogram") : 
  no applicable method for 'as.dendrogram' applied to an object of class "c('integer', 'numeric')"

Is using cutree the correct avenue to explore for plotting a restricted dendrogram? Is there any easier way to do this akin to matlab?

2条回答
再贱就再见
2楼-- · 2019-09-18 13:54

Just to clarify and provide some data... I do not want to drop any of the rows; instead of plotting/interpreting 831 branches, I would like to interpret 3 branches, and so would like the row dendrogram to be constrained to 3 branches (at height 150) and the corresponding heatmap of all 831 rows to be clustered into the 3 upper branches of the original dendrogram.

#Here is a random n=10 subset of my data; which for 10 observed fish has the %of time each spent within     
#a depth bin (Bin1-Bin8)

zz <- "ID Bin1 Bin2 Bin3 Bin4 Bin5 Bin6 Bin7 Bin8
1    0    0    0    0    0  0.0   0.0 100.0
2    0    0    0    0    0  0.0   0.0 100.0
3    0    0    0    0    0  0.0   0.0 100.0
4    0    0    0    0    0 70.8  29.2   0.0
5    0    0    0  100    0  0.0   0.0   0.0
6    0    0    0    0    0  0.0  93.3   6.7
7    0    0    0    0    0 27.5  72.5   0.0
8    0    0    0    0    0 53.5  46.5   0.0
9    0    0    0    0    0  0.0 100.0   0.0
10    0    0    0    0    0  0.0  72.1  27.9 "

TAD <- read.table(text=zz, header = TRUE)
IDnames <- TAD[,1]
x<-data.matrix(TAD[,2:ncol(TAD)])
rownames(x) <- IDnames

Without worrying about heatmap for the time being, the distance matrix and hclustering is done on the numeric matrix x

TAD.dist <- dist(x, method="manhattan", diag=FALSE, upper=FALSE)
TAD.cluster <- hclust(TAD.dist, method="average", members=NULL)

a plot of this resultant dendrogram reveals all ten branches,

plot(TAD.cluster)

but a cutoff height of 150 will restrain to only 3 branches

hcd = as.dendrogram(TAD.cluster)
rowDend<- cut(hcd, h = 150)$upper
plot(rowDend)

the dendrogram plotted with plot(rowDend) is what I would like to see on the row dendrogram for the following heatmap

heatmap.2 (x,
distfun = function(x) dist(x, method='manhattan', diag=FALSE, upper=FALSE),
hclustfun = function(x) hclust(x,method = 'average'),
dendrogram = "row",
#Rowv=rowDend, #this is where I thought I could restrain the row dendrogram
Colv="NA",
trace="none",
)

But I can not find any way to restrain the row dendrogram in heatmap for the desired number of interpretable branches. Plotting all 831 branches is extremely messy.

查看更多
对你真心纯属浪费
3楼-- · 2019-09-18 13:59

The question is what do you mean when you write "selecting number of leaf nodes".

The Rowv parameter in heatmap.2 needs a dendrogram or a TRUE/FALSE value. From the help file:

Rowv = determines if and how the row dendrogram should be reordered. By default, it is TRUE, which implies dendrogram is computed and reordered based on row means. If NULL or FALSE, then no dendrogram is computed and no reordering is done. If a dendrogram, then it is used "as-is", ie without any reordering. If a vector of integers, then dendrogram is computed and reordered based on the order of the vector.

So, when using cutree(hr, k=5), you will get a vector of integer (telling you to which cluster each item belong to, in a cut that produces 5 clusters). Using as.dendrogram on it will not produce a dendrogram, hence: Rowv=as.dendrogram(cutree(hr, k=5)), throws an error.

IF you want to highlight some of the branches in your tree, for that I invite you to look into the dendextend package to see which solution works for you best. Here is an example that may be what you are asking for:

library(gplots)
data(mtcars) 
x  <- as.matrix(mtcars)

# now let's spice up the dendrograms a bit:
Rowv  <- x %>% dist %>% hclust %>% as.dendrogram %>%
   set("branches_k_color", k = 3) %>% set("branches_lwd", 4) %>%
   rotate_DendSer(ser_weight = dist(x))
Colv  <- x %>% t %>% dist %>% hclust %>% as.dendrogram %>%
   set("branches_k_color", k = 2) %>% set("branches_lwd", 4) %>%
   rotate_DendSer(ser_weight = dist(t(x)))

heatmap.2(x, Rowv = Rowv, Colv = Colv)

With the following output:

enter image description here

Consider also looking at the recently published tutorial of dendextend, you may want to work with the branches_attr_by_labels function (in the tutorial it is under the section: "Adjusting branches based on labels"), with the ability to manipulate dendrograms to create plots such as this:

enter image description here

If what you want is to remove nodes, and leave only a few of them to be plotted, you should probably just create the heatmap for a subset of the data. You can also look at the prune function in dendextend (for the general purpose of looking at smaller dendrograms), but if you would want to use it for a heatmap, it is better to just work with a relevant subset of your data.

查看更多
登录 后发表回答