当使用了Weka K均值,可调用getAssignments()对所得到的模型的输出以获得每个给定的实例集群分配。 这里有一个(截断)Jython的例子:
>>>import weka.clusterers.SimpleKMeans as kmeans
>>>kmeans.buildClusterer(data)
>>>assignments = kmeans.getAssignments()
>>>assignments
>>>array('i',[14, 16, 0, 0, 0, 0, 16,...])
每个簇号的索引对应于该实例。 所以,例如0是在簇14中,实例1是在簇16,依此类推。
我的问题是:是否有Xmeans类似的东西? 我把整个API了这里 ,并没有看到这样的事情。
下面是从Weka的群发我的问题的答复:
"Not as such. But all clusterers have a clusterInstance() method. You can
pass each training instance through the trained clustering model to
obtain the cluster index for each."
这里是我的Jython实现这个建议的:
>>> import java.io.FileReader as FileReader
>>> import weka.core.Instances as Instances
>>> import weka.clusterers.XMeans as xmeans
>>> import java.io.BufferedReader as read
>>> import java.io.FileReader
>>> import java.io.File
>>> read = read(FileReader("some arff file"))
>>> data = Instances(read)
>>> file = FileReader("some arff file")
>>> data = Instances(file)
>>> xmeans = xmeans()
>>> xmeans.setMaxNumClusters(100)
>>> xmeans.setMinNumClusters(2)
>>> xmeans.buildClusterer(data)# here's our model
>>> enumerated_instances = data.enumerateInstances() #get the index of each instance
>>> for index, instance in enumerate(enumerated_instances):
cluster_num = xmeans.clusterInstance(instance) #pass each instance through the model
print "instance # ",index,"is in cluster ", cluster_num #pretty print results
instance # 0 is in cluster 1
instance # 1 is in cluster 1
instance # 2 is in cluster 0
instance # 3 is in cluster 0
我要离开这一切了作为参考,因为同样的方法可用于获取集群分配对任何Weka中的clusterers的结果。