how to set Spark Kmeans initial centers

2019-08-02 16:20发布

I'm using Spark ML for run Kmeans. I have bunch of data and three existing centers, for example the three centers are:[1.0,1.0,1.0],[5.0,5.0,5.0],[9.0,9.0,9.0]. So how can I indicate the Kmeans centers are the above three vectors. I saw Kmean object has seed parameter, but the seed parameter is an long type not an array. So how can I tell Spark Kmeans to only use the existing centers for clustering.

Or say, I didn't understand what does seed mean in Spark Kmeans, I suppose the seeds should be an array of vectors which represents the specified centers before running clustering.

标签： apache-spark machine-learning cluster-analysis k-means apache-spark-mllib

1条回答

The star\"

2楼-- · 2019-08-02 17:03

Indeed, seed does not mean what you think, i.e. it is not used for 'seeding' (initializing) the cluster centers, but simply for setting the random seed - you can confirm this in the documentation for the Scala and Python APIs.

To the best of my knowledge, there is currently (Spark 2.1) no way for supplying initial cluster centers for k-means in Spark ML (see this answer for Spark MLlib). The initMode parameter, according to the documentation:

can be either "random" to choose random points as initial cluster centers, or "k-means||" to use a parallel variant of k-means++')

0人赞添加讨论(0) 举报

how to set Spark Kmeans initial centers

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间