In order to clusterize a set of time series I'm looking for a smart distance metric. I've tried some well known metric but no one fits to my case.
ex: Let's assume that my cluster algorithm extracts this three centroids [s1, s2, s3]:
I want to put this new example [sx] in the most similar cluster:
The most similar centroids is the second one, so I need to find a distance function d that gives me d(sx, s2) < d(sx, s1)
and d(sx, s2) < d(sx, s3)
edit
Here the results with metrics [cosine, euclidean, minkowski, dynamic type warping] ]3
edit 2
User Pietro P suggested to apply the distances on the cumulated version of the time series The solution works, here the plots and the metrics:
nice question! using any standard distance of R^n (euclidean, manhattan or generically minkowski) over those time series cannot achieve the result you want, since those metrics are independent of the permutations of the coordinate of R^n (while time is strictly ordered and it is the phenomenon you want to capture).
A simple trick, that can do what you ask is using the cumulated version of the time series (sum values over time as time increases) and then apply a standard metric. Using the Manhattan metric, you would get as a distance between two time series the area between their cumulated versions.
what about using standard Pearson correlation coefficient? then you can assign the new point to the cluster with the highest coefficient.
correlation = scipy.stats.pearsonr(<new time series>, <centroid>)