Understanding heatmap dendogram clustering in R

2019-02-15 18:18发布

问题:

I would appreciate any info material on the dendograms (Colv, Rowv) of R's heatmap function. Such as how the clustering works (is it euclidean distance?). You don't have to post lengthy explanations, I would already be happy about some keywords that could bring me on the right track so I could do some online research.

Here is an excerpt from the help manual, which confuses me a little bit. What does "honored" mean in this context and how is it different from reordering?

If either Rowv or Colv are dendrograms they are honored (and not reordered).

回答1:

Rowv and Colv control whether the rows and columns of your data set should be reordered and if so how.

The possible values for them are TRUE, NULL, FALSE, a vector of integers, or a dendrogram object.

  • In the default mode TRUE, heatmap.2 performs clustering using the hclustfun and distfun parameters. This defaults to complete linkage clustering, using a euclidean distance measure. The dendrogram is then reordered using the row/column means. You can control this by specifying different functions to hclustfun or distfun. For example to use the Manhattan distance rather than the euclidiean distance you would do:

    heatmap.2(x,...,distfun=function (y) dist(y,method = "manhattan") )
    

    check out ?dist and ?hclust. If you want to learn more about clustering you could start with "distance measures" and "agglomeration methods".

  • If Rowv/Colv is NULL or FALSE then no reordering or clustering is done and the matrix is plotted as-is.

  • If Rowv/Colv is a numeric vector, then the clustering is computed as for TRUE and the reordering of the dendrogram is done using the vector supplied to Rowv/Colv.

  • If Rowv/Colv is a dendrogram object, then this dendrogram will be used to reorder the matrix. Dendrogram objects can be generated, for example, by:

    rowDistance = dist(x, method = "manhattan")
    rowCluster = hclust(rowDistance, method = "complete")
    rowDend = as.dendrogram(rowCluster)
    rowDend = reorder(rowDend, rowMeans(x))
    

    which generates a complete clustering on a manhattan distance, ordered by row means. You can now pass rowDend to Rowv.

    heatmap.2(x,...,Rowv = rowDend)
    

    This can be useful, if for example you want to cluster the rows and columns in different ways, or use a clustering that someone else has given you, or you want to do something funky that cannot be accommodated by just specifying the hclustfun and the distfun. This is what is meant by" the dendrogram is honoured": it is used instead of what is specified by hclustfun and distfun.



回答2:

To look into how it handles Rowv/Colv exactly, you might also use body(heatmap) to display its source.



回答3:

From the manual:

distfun : function used to compute the distance (dissimilarity) between both rows and columns. Defaults to dist.

hclustfun : function used to compute the hierarchical clustering when Rowv or Colv are not dendrograms. Defaults to hclust. Should take as argument a result of distfun and return an object to which as.dendrogram can be applied.

dist() has as default the euclidean distance and hclust() the complete linkage method.