I'm trying to use an awesome R-package named dendextend, to plot a dendrogram and color its branches & labels according to a set of previously defined groups.
I've read your answers in Stack Overflow, and the FAQs of dendextend vignette, but I'm still not sure on how to achieve my goal.
Let's imagine I have a dataframe with a first column with the names of the individual to use for the clustering, then several columns with the factors to be analyzed, and the last column with the group information for each of the individuals (See following table).
individual 282856 282960 283275 283503 283572 283614 284015 group
pat15612 0 0 0 0 0 0 0 g2
pat38736 0 0 0 0 0 0 0 g2
pat38740 0 0 0 0 0 1 0 g2
pat38742 0 0 0 0 0 1 0 g4
pat38743 0 0 1 0 0 1 0 g3
pat38745 0 0 1 0 1 0 0 g4
pat38750 0 0 0 1 0 1 0 g4
pat38753 0 0 0 1 0 0 0 g3
pat40120 0 0 0 0 1 0 0 g4
pat40124 0 0 0 0 1 0 0 g4
pat40125 0 0 0 0 1 1 0 g4
pat40126 0 0 0 1 0 0 0 g4
pat40137 1 0 0 0 0 0 0 g4
pat40142 0 1 0 0 0 0 0 g5
pat46903 0 0 0 0 0 1 0 g1
pat67612 1 0 0 0 1 0 0 g1
pat67621 0 0 0 0 0 0 0 g2
pat67630 0 0 1 0 0 0 0 g2
pat67634 0 0 0 0 0 0 0 g5
pat67657 0 1 0 1 0 0 0 g5
pat67680 0 0 0 0 0 1 0 g5
pat67683 0 0 1 1 0 0 0 g6
How do I do to color the branches and labels representing each of the individuals based on the group they belong, even though they may cluster in different blocks?
In case this can be achieved, is there a way to define the colors assigned to each group?
I'm glad you solved this on your own.
The simpler solution is to use the order_value = TRUE
argument in the set
function. For example:
library(dendextend)
iris2 <- iris[,-5]
rownames(iris2) <- paste(iris[,5],iris[,5],iris[,5], rownames(iris2))
dend <- iris2 %>% dist %>% hclust %>% as.dendrogram
dend <- dend %>% set("labels_colors", as.numeric(iris[,5]), order_value = TRUE) %>%
set("labels_cex", .5)
par(mar = c(4,1,0,8))
plot(dend, horiz = T)
Will result in (as you can see, the colors of the labels is based on the other variable "Species" in the iris dataset):
(p.s.: I tripled the number of times a species appears in order to make it easier to see how the color relates to the length of the label)
I was able to do it using another package called "sparcl". I did it based on a previous post (How to colour the labels of a dendrogram by an additional factor variable in R).
Here is my code:
#load the dataset.....
#calculate distances
d <- dist(dataset2, method="Jaccard")
## Hierarchical cluster the data
hc <- hclust(d)
dend <- as.dendrogram(hc)
#create labels
labs=dataset$individual
#format to dendrogram
hcd = as.dendrogram(hc)
plot(hcd, cex=0.6)
# factor variable for colours
Var = dataset$group
# convert numbers to colours
varCol = gsub("g1.*","green",Var)
varCol = gsub("g2.*","gold",varCol)
varCol = gsub("g3.*","pink",varCol)
varCol = gsub("g4.*","purple",varCol)
varCol = gsub("g5.*","blue",varCol)
varCol = gsub("g6.*","red",varCol)
#colour-code dendrogram branches by a factor
library(sparcl)
ColorDendrogram(hc, y=varCol, branchlength=0.9, labels=labs,
xlab="", ylab="", sub="")
Finally, i managed to infere a "dendextend" package solution based on the example of this post (How to colour the labels of a dendrogram by an additional factor variable in R):
# install.packages("dendextend")
library(dendextend)
#load the dataset.....
dataset2<-dataset[,1:7]#same dataset as in the example
#calculate the dendrogram
dend <- as.dendrogram(hclust(dist(dataset2)))
#capture the colors from the "group" column
colors_to_use <- as.numeric(dataset$group)
colors_to_use
# sort the colors based on their order in dend:
colors_to_use <- colors_to_use[order.dendrogram(dend)]
colors_to_use
#Apply colors
labels_colors(dend) <- colors_to_use
# Patient labels have a color based on their group
labels_colors(dend)
plot(dend, main = "Color in labels")