Suppose a dataframe which contains 1000 rows. Each row represents a time series.
Then I built a DTW algorithm to calculate the distance between 2 rows.
I don't know what to do next to complish an unsupervised classification task for the dataframe.
How to label all rows of the dataframe?
Definitions
I show below step by step about how the two time-series can be built and how the Dynamic Time Warping (DTW) algorithm can be computed. You can build a unsupervised k-means clustering with scikit-learn without specifying the number of centroids, then the scikit-learn knows to use the algorithm called
auto
.Building the time-series and computing the DTW
You have have two time-series and you compute the DTW such that
Classification of the time-series with KNN
It is not evident in the question about what should be labelled and with which labels? So please provide the following details
after which we can decide our classification algorithm that may be the so-called KNN algorithm. It works such that you have two separate data sets: training set and test set. By training set, you teach the algorithm to label the time series while the test set is a tool by which we can measure about how well the model works with model selection tools such as AUC.
Small puzzle left open until details provided about the questions
Scikit-learn comparison article about classifiers is provided in the second enumerate item below.
Clustering with K-means (not the same as KNN)
K-means is the clustering algorithm and its unsupervised version you can use such that
which is very different algorithm than the KNN algorithm: here we do not need any labels. I provide you further material on the topic below in the first enumerate item.
Further reading
Does K-means incorporate the K-nearest-neighbour algorithm?
Comparison about classifiers in scikit learn here
You can utilize DTW. In fact, I had the same problem for one of my projects and I wrote my own class for that in Python.
Here is the logic;
n! / k! / (n-k)!
. These would be something like potential centers.And the code;
If you want to see it on action, you can refer my repository about Time Series Clustering.