What exactly is the initializationSteps parameter

2019-09-06 20:22发布

问题:

I know what k-means is and I also understand what k-means++ algorithm is. I believe the only change is the way the initial K centers are found.

In the ++ version we initially choose a center and using a probability distribution we choose the remaining k-1 centers.

In the MLLib algorithm for k-means what is the initializationSteps parameter?

回答1:

To be precise k-means++ is an algorithm for choosing initial centers and it doesn't describe a whole training process.

MLLib k-means is using k-means|| for initialization which is a distributed variant of ++. It samples not one, but multiple points for number of iterations.

initializationSteps corresponds to the number of iterations and according to the original paper should be roughly O(log n).