I have a cluster plot by R while I want to optimize the "elbow criterion" of clustering with a wss plot, but I do not know how to draw a wss plot for a giving cluster, anyone would help me?
Here is my data:
Friendly<-c(0.467,0.175,0.004,0.025,0.083,0.004,0.042,0.038,0,0.008,0.008,0.05,0.096)
Polite<-c(0.117,0.55,0,0,0.054,0.017,0.017,0.017,0,0.017,0.008,0.104,0.1)
Praising<-c(0.079,0.046,0.563,0.029,0.092,0.025,0.004,0.004,0.129,0,0,0,0.029)
Joking<-c(0.125,0.017,0.054,0.383,0.108,0.054,0.013,0.008,0.092,0.013,0.05,0.017,0.067)
Sincere<-c(0.092,0.088,0.025,0.008,0.383,0.133,0.017,0.004,0,0.063,0,0,0.188)
Serious<-c(0.033,0.021,0.054,0.013,0.2,0.358,0.017,0.004,0.025,0.004,0.142,0.021,0.108)
Hostile<-c(0.029,0.004,0,0,0.013,0.033,0.371,0.363,0.075,0.038,0.025,0.004,0.046)
Rude<-c(0,0.008,0,0.008,0.017,0.075,0.325,0.313,0.004,0.092,0.063,0.008,0.088)
Blaming<-c(0.013,0,0.088,0.038,0.046,0.046,0.029,0.038,0.646,0.029,0.004,0,0.025)
Insincere<-c(0.075,0.063,0,0.013,0.096,0.017,0.021,0,0.008,0.604,0.004,0,0.1)
Commanding<-c(0,0,0,0,0,0.233,0.046,0.029,0.004,0.004,0.538,0,0.146)
Suggesting<-c(0.038,0.15,0,0,0.083,0.058,0,0,0,0.017,0.079,0.133,0.442)
Neutral<-c(0.021,0.075,0.017,0,0.033,0.042,0.017,0,0.033,0.017,0.021,0.008,0.717)
data <- data.frame(Friendly,Polite,Praising,Joking,Sincere,Serious,Hostile,Rude,Blaming,Insincere,Commanding,Suggesting,Neutral)
And here is my code of clustering:
cor <- cor (data)
dist<-dist(cor)
hclust<-hclust(dist)
plot(hclust)
And I will get a dendrogram after running the code above, while how can I draw a plot like this:
If I follow what you want, then we need a function to compute WSS
and a wrapper for this
wss()
functionThis wrapper takes the following arguments, inputs:
i
the number of clusters to cut the data intohc
the hierarchical cluster analysis objectx
the original datawrap
then cuts the dendrogram intoi
clusters, splits the original data into the cluster membership given bycl
and computes the WSS for each cluster. These WSS values are summed to give the WSS for that clustering.We run all of this using
sapply
over the number of clusters 1, 2, ...,nrow(data)
A screeplot can be drawn using
Here is an example using the well-known Edgar Anderson Iris data set:
This gives:
We can zoom in by just showing the first 1:50 clusters
which gives
You can speed up the main computation step by either running the
sapply()
via an appropriate parallelised alternative, or just do the computation for a fewer thannrow(data)
clusters, e.g.