I have a panda DataFrame from which, i would like to do clustering for each columns. I am using sklearn and this is what i have:
data= pd.read_csv("data.csv")
data=pd.DataFrame(data)
data=data.set_index("Time")
#print(data)
cluster_numbers=2
list_of_cluster=[]
for k,v in data.iteritems():
temp=KMeans(n_clusters=cluster_numbers)
temp.fit(data[k])
print(k)
print("predicted",temp.predict(data[k]))
list_of_cluster.append(temp.predict(data[k]))
when i try to run it, i have this error: ValueError: n_samples=1 should be >= n_clusters=2
I am wondering what is the problem as i have more samples than number of clusters. Any help will be appreciated
The K-Means clusterer expects a 2D array, each row a data point, which can also be one-dimensional. In your case you have to reshape the pandas column to a matrix having
len(data)
rows and 1 column. See below an example that works: