After reading Unbalanced factor of KMeans, I am trying to understand how this works. I mean, from my examples, I can see that the less the value of the factor, the better the quality of KMeans clustering, i.e. the more balanced are its clusters. But what is the naked mathematical interpretation of this factor? Is this a known quantity or something?
Here are my examples:
C1 = 10
C2 = 100
pdd = [(C1,10), (C2, 100)]
n = 2 <-- #clusters
total = 110 <-- #points
uf = 10 * 10 + 100 * 100
uf = 100100 * 2 / 12100 = 16.5
C1 = 50
C2 = 60
pdd = [(C1, 50), (C2, 60)]
n = 2
total = 110
uf = 2500 + 3600
uf = 6100 * 2 / 12100 = 1.008
C1 = 1
C2 = 1
pdd = [(C1, 1), (C2, 1)]
n = 2
total = 2
uf = 2
uf = 2 * 2 / 2 * 2 = 1
as said in Cross Validated: Understanding the quality of the KMeans algorithm.