Plot KMeans clusters and classification for 1-dime

I am using KMeans to cluster the three time-series datasets with different characterstics. For reproducibility reasons, I am sharing the data here.

Here is my code

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

protocols = {}

types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"}

for protname, fname in types.items():
    col_time,col_window = np.loadtxt(fname,delimiter=',').T
    trailing_window = col_window[:-1] # "past" values at a given index
    leading_window  = col_window[1:]  # "current values at a given index
    decreasing_inds = np.where(leading_window < trailing_window)[0]
    quotient = leading_window[decreasing_inds]/trailing_window[decreasing_inds]
    quotient_times = col_time[decreasing_inds]

    protocols[protname] = {
        "col_time": col_time,
        "col_window": col_window,
        "quotient_times": quotient_times,
        "quotient": quotient,
    }



k_means = KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=0, tol=0.0001, verbose=0)
k_means.fit(quotient.reshape(-1,1))

This way, given a new data point (with quotient and quotient_times), I want to know which cluster it belongs to by building each dataset stacking these two transformed features quotient and quotient_times with KMeans.

k_means.labels_ gives this output array([1, 1, 0, 1, 2, 1, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int32)

Finally, I want to visualize the clusters using plt.plot(k_means, ".",color="blue") but I am getting this error: TypeError: float() argument must be a string or a number, not 'KMeans'. How do we plot KMeans clusters?

标签： python matplotlib machine-learning scikit-learn k-means

2条回答

干净又极端

2楼-- · 2019-06-03 23:00

If I understand correctly what you want to plot is the boundary decision of your Kmeans result. You can find an example of how to do it in scikit-lean website here.

The above example is even doing PCA so the data can be visualize in 2D (if your data dimension is higher than 2) for you it's irrelevant.

You can easily plot your scatter points color by the Kmeans decision so you can better understand where your clustering went wrong.

0人赞添加讨论(0) 举报

别忘想泡老子

3楼-- · 2019-06-03 23:08

What you're effectively looking for is a range of values between which points are considered to be in a given class. It's quite unusual to use KMeans to classify 1d data in this way, although it certainly works. As you've noticed you need to convert your input data to a 2d array in order to use the method.

k_means = KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=0, tol=0.0001, verbose=0)

quotient_2d = quotient.reshape(-1,1)
k_means.fit(quotient_2d)

You will need the quotient_2d again for the classification (prediction) step later.

First we can plot the centroids, since the data is 1d the x-axis point is arbitrary.

colors = ['r','g','b']
centroids = k_means.cluster_centers_
for n, y in enumerate(centroids):
    plt.plot(1, y, marker='x', color=colors[n], ms=10)
plt.title('Kmeans cluster centroids')

This produces the following plot.

To get cluster membership for the points, pass quotient_2d to .predict. This returns an array of numbers for class membership, e.g.

>>> Z = k_means.predict(quotient_2d)
>>> Z
array([1, 1, 0, 1, 2, 1, 0, 0, 2, 0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0], dtype=int32)

We can use this to filter our original data, plotting each class in a separate color.

# Plot each class as a separate colour
n_clusters = 3 
for n in range(n_clusters):
    # Filter data points to plot each in turn.
    ys = quotient[ Z==n ]
    xs = quotient_times[ Z==n ]

    plt.scatter(xs, ys, color=colors[n])

plt.title("Points by cluster")

This generates the following plot with the original data, each point coloured by the cluster membership.

0人赞添加讨论(0) 举报

Plot KMeans clusters and classification for 1-dime

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间