How to assign an new observation to existing Kmean

I used the below code to create k-means clusters using Scikit learn.

kmean = KMeans(n_clusters=nclusters,n_jobs=-1,random_state=2376,max_iter=1000,n_init=1000,algorithm='full',init='k-means++')

kmean_fit = kmean.fit(clus_data)

I also have saved the centroids using kmean_fit.cluster_centers_

I then pickled the K means object.

filename = pickle_path+'\\'+'_kmean_fit.sav'
pickle.dump(kmean_fit, open(filename, 'wb'))

So that I can load the same kmeans pickle object and apply it to new data when it comes, using kmean_fit.predict().

Questions :

Will the approach of loading kmeans pickle object and applying kmean_fit.predict() allow me to assign the new observation to existing clusters based on centroid of the existing clusters? Does this approach just recluster from scratch on the new data?
If this method wont work how to assign the new observation to existing clusters given that I already have saved the cluster centriods using efficent python code?

PS: I know building a classifer using existing clusters as dependent variable is another way but I dont want to do that because of time crunch.

Yes. Whether the sklearn.cluster.KMeans object is pickled or not (if you un-pickle it correctly, you'll be dealing with the "same" original object) does not affect that you can use the predict method to cluster a new observation.

An example:

from sklearn.cluster import KMeans
from sklearn.externals import joblib

model = KMeans(n_clusters = 2, random_state = 100)
X = [[0,0,1,0], [1,0,0,1], [0,0,0,1],[1,1,1,0],[0,0,0,0]]
model.fit(X)

Out:

KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
    verbose=0)

Continue:

joblib.dump(model, 'model.pkl')  
model_loaded = joblib.load('model.pkl')

model_loaded

Out:

KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=100, tol=0.0001,
    verbose=0)

See how the n_clusters and random_state parameters are the same between the model and model_new objects? You're good to go.

Predict with the "new" model:

model_loaded.predict([0,0,0,0])

Out[64]: array([0])

How to assign an new observation to existing Kmean

问题:

回答1:

收藏的人(0)

How to assign an new observation to existing Kmean

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮