There is a function kmeans2
in scipy.cluster.vq
that returns the labels, too.
In [8]: X = scipy.randn(100, 2)
In [9]: centroids, labels = kmeans2(X, 3)
In [10]: labels
Out[10]:
array([2, 1, 2, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 2, 2, 1, 2, 1, 2, 1, 2, 0,
1, 0, 2, 0, 1, 2, 0, 1, 0, 1, 1, 2, 2, 2, 2, 1, 2, 1, 1, 1, 2, 0, 0,
2, 2, 0, 1, 0, 0, 0, 2, 2, 2, 0, 0, 1, 2, 1, 0, 0, 0, 2, 1, 1, 1, 1,
1, 0, 0, 1, 0, 1, 2, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 2, 0, 2, 2, 0,
1, 1, 0, 1, 0, 0, 0, 2])
Otherwise, if you must use kmeans
, you can also use vq
to get labels:
In [17]: from scipy.cluster.vq import kmeans, vq
In [18]: codebook, distortion = kmeans(X, 3)
In [21]: code, dist = vq(X, codebook)
In [22]: code
Out[22]:
array([1, 0, 1, 0, 2, 2, 2, 0, 1, 1, 0, 2, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1,
2, 2, 1, 2, 0, 1, 1, 0, 2, 2, 0, 1, 0, 1, 0, 2, 1, 2, 0, 2, 1, 1, 1,
0, 1, 2, 0, 1, 2, 2, 1, 1, 1, 2, 2, 0, 0, 2, 2, 2, 2, 1, 0, 2, 2, 2,
0, 1, 1, 2, 1, 0, 0, 0, 0, 1, 2, 1, 2, 0, 2, 0, 2, 2, 1, 1, 1, 1, 1,
2, 0, 2, 0, 2, 1, 1, 1])
Documentation: scipy.cluster.vq