出口树状与R表(Exporting dendrogram as table in R)

2019-07-28 21:46发布

我想来自R导出hclust,树形图到数据表,以便随后将其导入到另一个(“自制”)的软件。 str(unclass(fit))提供树状文本概述,但我正在寻找的是一个真正的数字表。 我看了看Bioconductor的CTC包,但输出它的产生看起来有点cryptical。 我想有类似这样的桌上的东西: http://stn.spotfire.com/spotfire_client_help/heat/heat_importing_exporting_dendrograms.htm有没有办法让这出R中的hclust对象的?

Answer 1:

在任何情况下也有兴趣在树状图的出口,这里是我的解决方案。 最可能的是,它不是最好的一个,因为我开始,使用R只是在最近,但至少它的工作原理。 因此,如何提高代码的建议是值得欢迎的。

所以,如果hr是我的hclust对象和df是我的数据,第一列,其中包含从0开始一个简单的索引和行的名称是群集项的名称:

# Retrieve the leaf order (row name and its position within the leaves)
leaf.order <- matrix(data=NA, ncol=2, nrow=nrow(df),
              dimnames=list(c(), c("row.num", "row.name")))
leaf.order[,2] <- hr$labels[hr$order]
for (i in 1:nrow(leaf.order)) {
   leaf.order[which(leaf.order[,2] %in% rownames(df[i,])),1] <- df[i,1]
}
leaf.order <- as.data.frame(leaf.order)

hr.merge <- hr$merge
n <- max(df[,1])

# Re-index all clustered leaves and nodes. First, all leaves are indexed starting from 0.
# Next, all nodes are indexed starting from max. index leave + 1.
for (i in 1:length(hr.merge)) {
  if (hr.merge[i]<0) {hr.merge[i] <- abs(hr.merge[i])-1}
  else { hr.merge[i] <- (hr.merge[i]+n) }
}
node.id <- c(0:length(hr.merge))

# Generate dendrogram matrix with node index in the first column.
dend <- matrix(data=NA, nrow=length(node.id), ncol=6,
           dimnames=list(c(0:(length(node.id)-1)),
              c("node.id", "parent.id", "pruning.level",
              "height", "leaf.order", "row.name")) )
dend[,1] <- c(0:((2*nrow(df))-2))  # Insert a leaf/node index

# Calculate parent ID for each leaf/node:
# 1) For each leaf/node index, find the corresponding row number within the merge-table.
# 2) Add the maximum leaf index to the row number as indexing the nodes starts after indexing all the leaves.
for (i in 1:(nrow(dend)-1)) {
  dend[i,2] <- row(hr.merge)[which(hr.merge %in% dend[i,1])]+n
}

# Generate table with indexing of all leaves (1st column) and inserting the corresponding row names into the 3rd column.
hr.order <- matrix(data=NA,
           nrow=length(hr$labels), ncol=3,
           dimnames=list(c(), c("order.number", "leaf.id", "row.name")))
hr.order[,1] <- c(0:(nrow(hr.order)-1))
hr.order[,3] <- t(hr$labels[hr$order])
hr.order <- data.frame(hr.order)
hr.order[,1] <- as.numeric(hr.order[,1])

# Assign the row name to each leaf.
dend <- as.data.frame(dend)
for (i in 1:nrow(df)) {
      dend[which(dend[,1] %in% df[i,1]),6] <- rownames(df[i,])
}

# Assign the position on the dendrogram (from left to right) to each leaf.
for (i in 1:nrow(hr.order)) {
      dend[which(dend[,6] %in% hr.order[i,3]),5] <- hr.order[i,1]-1
}

# Insert height for each node.
dend[c((n+2):nrow(dend)),4] <- hr$height

# All leaves get the highest possible pruning level
dend[which(dend[,1] <= n),3] <- nrow(hr.merge)

# The nodes get a decreasing index starting from the pruning level of the
# leaves minus 1 and up to 0

for (i in (n+2):nrow(dend)) {
   if ((dend[i,4] != dend[(i-1),4]) || is.na(dend[(i-1),4])){
        dend[i,3] <- dend[(i-1),3]-1}
      else { dend[i,3] <- dend[(i-1),3] }
}
dend[,3] <- dend[,3]-min(dend[,3])

dend <- dend[order(-node.id),]

# Write results table.
write.table(dend, file="path", sep=";", row.names=F)


Answer 2:

有包,做完全相反的,你想要的东西- Labeltodendro ;-)

但严重的是,你就不能手动提取的元素hclust对象(例如$merge$height$order ),并从提取的元素创建自定义的表?



文章来源: Exporting dendrogram as table in R