Let's do a quick 3-clusters classification on the iris dataset with the FactoMineR package:
library(FactoMineR)
model <- HCPC(iris[,1:4], nb.clust = 3)
summary(model$data.clust$clust)
1 2 3
50 62 38
We see that 50 observations are in cluster 1, 62 in cluster 2 and 38 in cluster 3.
Now, we want to visualize these 3 clusters in a dendrogram, with the package dendextend which enables to make pretty ones:
library(dendextend)
library(dplyr)
model$call$t$tree %>%
as.dendrogram() %>%
color_branches(k = 3, groupLabels = unique(model$data.clust$clust)) %>%
plot()
The problem is that the labels on the dendrogram don't meet the true labels of the classification. The cluster 2 should be the biggest one (62 observations according to the data), but on the dendrogram, we clearly see it is the smallest one.
I tried different thinks but nothing work for now, so if you have any idea of which input give to groupLabels =
in order to match the real labels, that would be great.
Looking inside
dendextend::color_branches
, we can see that group labels are assigned using the commandg <- dendextend::cutree(dend, k = k, h = h, order_clusters_as_data = FALSE)
.This fact can be used for building a map between the cluster labels assigned by
HCPC
and group labels assigned bydendextend::color_branches
.This table shows that cluster labels 2 and 3 are matched with group labels 3 and 2, respectively. (Surprisingly, for two sample units this rule is not true.)
The groups levels that need to be passed to
dendextend::color_branches
can be found as follows:Here is the dendrogram: