I would appreciate any info material on the dendograms (Colv, Rowv) of R's heatmap function. Such as how the clustering works (is it euclidean distance?). You don't have to post lengthy explanations, I would already be happy about some keywords that could bring me on the right track so I could do some online research.
Here is an excerpt from the help manual, which confuses me a little bit. What does "honored" mean in this context and how is it different from reordering?
If either Rowv or Colv are dendrograms they are honored (and not
reordered).
Rowv
and Colv
control whether the rows and columns of your data set should be reordered and if so how.
The possible values for them are TRUE
, NULL
, FALSE
, a vector of integers, or a dendrogram object.
In the default mode TRUE
, heatmap.2 performs clustering using the hclustfun
and distfun
parameters. This defaults to complete linkage clustering, using a euclidean distance measure. The dendrogram is then reordered using the row/column means. You can control this by specifying different functions to hclustfun
or distfun
. For example to use the Manhattan distance rather than the euclidiean distance you would do:
heatmap.2(x,...,distfun=function (y) dist(y,method = "manhattan") )
check out ?dist
and ?hclust
. If you want to learn more about clustering you could start with "distance measures" and "agglomeration methods".
If Rowv
/Colv
is NULL
or FALSE
then no reordering or clustering is done and the matrix is plotted as-is.
If Rowv
/Colv
is a numeric vector, then the clustering is computed as for TRUE
and the reordering of the dendrogram is done using the vector supplied to Rowv
/Colv
.
If Rowv
/Colv
is a dendrogram object, then this dendrogram will be used to reorder the matrix. Dendrogram objects can be generated, for example, by:
rowDistance = dist(x, method = "manhattan")
rowCluster = hclust(rowDistance, method = "complete")
rowDend = as.dendrogram(rowCluster)
rowDend = reorder(rowDend, rowMeans(x))
which generates a complete clustering on a manhattan distance, ordered by row means. You can now pass rowDend
to Rowv
.
heatmap.2(x,...,Rowv = rowDend)
This can be useful, if for example you want to cluster the rows and columns in different ways, or use a clustering that someone else has given you, or you want to do something funky that cannot be accommodated by just specifying the hclustfun and the distfun. This is what is meant by" the dendrogram is honoured": it is used instead of what is specified by hclustfun and distfun.
To look into how it handles Rowv/Colv exactly, you might also use body(heatmap)
to display its source.
From the manual:
distfun : function used to compute the distance (dissimilarity) between
both rows and columns. Defaults to dist.
hclustfun : function used to compute the hierarchical clustering when
Rowv or Colv are not dendrograms. Defaults to hclust. Should take as
argument a result of distfun and return an object to which
as.dendrogram can be applied.
dist()
has as default the euclidean distance and hclust()
the complete linkage method.