How to draw the plot of within-cluster sum-of-squa

I have a cluster plot by R while I want to optimize the "elbow criterion" of clustering with a wss plot, but I do not know how to draw a wss plot for a giving cluster, anyone would help me?

Here is my data:

Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096)
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1)
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029)
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067)
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188)
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108)
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046)
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088)
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025)
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1)
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146)
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442)
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717)

data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)

And here is my code of clustering:

cor <- cor (data)
dist<-dist(cor)
hclust<-hclust(dist)
plot(hclust)

And I will get a dendrogram after running the code above, while how can I draw a plot like this:

enter image description here

标签： r plot cluster-analysis hierarchical-clustering

1条回答

冷血范

2楼-- · 2019-02-03 20:43

If I follow what you want, then we need a function to compute WSS

wss <- function(d) {
  sum(scale(d, scale = FALSE)^2)
}

and a wrapper for this wss() function

wrap <- function(i, hc, x) {
  cl <- cutree(hc, i)
  spl <- split(x, cl)
  wss <- sum(sapply(spl, wss))
  wss
}

This wrapper takes the following arguments, inputs:

i the number of clusters to cut the data into
hc the hierarchical cluster analysis object
x the original data

wrap then cuts the dendrogram into i clusters, splits the original data into the cluster membership given by cl and computes the WSS for each cluster. These WSS values are summed to give the WSS for that clustering.

We run all of this using sapply over the number of clusters 1, 2, ..., nrow(data)

res <- sapply(seq.int(1, nrow(data)), wrap, h = cl, x = data)

A screeplot can be drawn using

plot(seq_along(res), res, type = "b", pch = 19)

Here is an example using the well-known Edgar Anderson Iris data set:

iris2 <- iris[, 1:4]  # drop Species column
cl <- hclust(dist(iris2), method = "ward.D")

## Takes a little while as we evaluate all implied clustering up to 150 groups
res <- sapply(seq.int(1, nrow(iris2)), wrap, h = cl, x = iris2)
plot(seq_along(res), res, type = "b", pch = 19)

This gives:

enter image description here

We can zoom in by just showing the first 1:50 clusters

plot(seq_along(res[1:50]), res[1:50], type = "o", pch = 19)

which gives

enter image description here

You can speed up the main computation step by either running the sapply() via an appropriate parallelised alternative, or just do the computation for a fewer than nrow(data) clusters, e.g.

res <- sapply(seq.int(1, 50), wrap, h = cl, x = iris2) ## 1st 50 groups

0人赞添加讨论(0) 举报

How to draw the plot of within-cluster sum-of-squa

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间