I have a set of data clustering into k groups, each cluster has a minimum size constraint of m
I've done some reclustering of the data. So now I got this set of points that each one has one or more better clusters to be in, but cannot be switched individually because it'll violate the size constraint.
Goal: minimize the sum of distance from each point to its cluster center.
Subject to: Minimum cluster size m
I want to find an algorithm to reassign all points without violating the constraint, while guaranteed to decrease the objective.
I thought of using Graph to represent pair wise relationship between points. But I'm not sure how to do the reassignment since it exists the possibility of a big dense cycle, and I got lost in swapping multiple points between multiple clusters.
I also created a list of cluster pairs with possible swapping candidates, but still couldn't find the way to optimal the objective.
I hope I explained my situation. I'm new to algorithm, and not familiar with the jargon and rules. If any other information needed, please let me know.
I've done a lot of research, I've tried out the algorithm in this paper, but without success, since the sum of membership degree doesn't necessarily correlate to cluster size. Clustering with Size Constraints
I've also read other similar posts on SO, but didn't find a detail algorithm I could implement.
I've tried to construct a weighted directed graph, with vertex representing clusters and edges from A to B represents points in cluster A willing to relocate to cluster B. and weight to be the sum of points
But with my data, it turns out all the nodes to be in a huge cycle with very dense edges. Because of my limited experience, I still could not figure out how to reassign among so many clusters. Any suggestions are appreciated!
Something like this.