I am using KMeans
to cluster the three time-series datasets with different characterstics. For reproducibility reasons, I am sharing the data here.
Here is my code
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
protocols = {}
types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}
for protname, fname in types.items():
col_time,col_window = np.loadtxt(fname,delimiter=',').T
trailing_window = col_window[:-1] # "past" values at a given index
leading_window = col_window[1:] # "current values at a given index
decreasing_inds = np.where(leading_window < trailing_window)[0]
quotient = leading_window[decreasing_inds]/trailing_window[decreasing_inds]
quotient_times = col_time[decreasing_inds]
protocols[protname] = {
"col_time": col_time,
"col_window": col_window,
"quotient_times": quotient_times,
"quotient": quotient,
}
k_means = KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
random_state=0, tol=0.0001, verbose=0)
k_means.fit(quotient.reshape(-1,1))
This way, given a new data point (with quotient
and quotient_times
), I want to know which cluster
it belongs to by building each dataset stacking these two transformed features quotient
and quotient_times
with KMeans
.
k_means.labels_
gives this output array([1, 1, 0, 1, 2, 1, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int32)
Finally, I want to visualize the clusters using plt.plot(k_means, ".",color="blue")
but I am getting this error: TypeError: float() argument must be a string or a number, not 'KMeans'
. How do we plot KMeans
clusters?
If I understand correctly what you want to plot is the boundary decision of your Kmeans result. You can find an example of how to do it in scikit-lean website here.
The above example is even doing PCA so the data can be visualize in 2D (if your data dimension is higher than 2) for you it's irrelevant.
You can easily plot your scatter points color by the Kmeans decision so you can better understand where your clustering went wrong.
What you're effectively looking for is a range of values between which points are considered to be in a given class. It's quite unusual to use KMeans to classify 1d data in this way, although it certainly works. As you've noticed you need to convert your input data to a 2d array in order to use the method.
You will need the
quotient_2d
again for the classification (prediction) step later.First we can plot the centroids, since the data is 1d the x-axis point is arbitrary.
This produces the following plot.
To get cluster membership for the points, pass
quotient_2d
to.predict
. This returns an array of numbers for class membership, e.g.We can use this to filter our original data, plotting each class in a separate color.
This generates the following plot with the original data, each point coloured by the cluster membership.