I know what k-means is and I also understand what k-means++ algorithm is. I believe the only change is the way the initial K centers are found.
In the ++ version we initially choose a center and using a probability distribution we choose the remaining k-1 centers.
In the MLLib algorithm for k-means what is the initializationSteps
parameter?
To be precise k-means++ is an algorithm for choosing initial centers and it doesn't describe a whole training process.
MLLib k-means is using k-means|| for initialization which is a distributed variant of ++. It samples not one, but multiple points for number of iterations.
initializationSteps
corresponds to the number of iterations and according to the original paper should be roughly O(log n).