I used the below code to create k-means clusters using Scikit learn.
kmean = KMeans(n_clusters=nclusters,n_jobs=-1,random_state=2376,max_iter=1000,n_init=1000,algorithm='full',init='k-means++')
kmean_fit = kmean.fit(clus_data)
I also have saved the centroids using kmean_fit.cluster_centers_
I then pickled the K means object.
filename = pickle_path+'\\'+'_kmean_fit.sav'
pickle.dump(kmean_fit, open(filename, 'wb'))
So that I can load the same kmeans pickle object and apply it to new data when it comes, using kmean_fit.predict().
Questions :
Will the approach of loading kmeans pickle object and applying
kmean_fit.predict()
allow me to assign the new observation to existing clusters based on centroid of the existing clusters? Does this approach just recluster from scratch on the new data?If this method wont work how to assign the new observation to existing clusters given that I already have saved the cluster centriods using efficent python code?
PS: I know building a classifer using existing clusters as dependent variable is another way but I dont want to do that because of time crunch.