mlpy - Dynamic Time Warping depends on x?

2020-04-07 20:42发布

问题:

I am trying to get the distance between these two arrays shown below by DTW.

I am using the Python mlpy package that offers

dist, cost, path = mlpy.dtw_std(y1, y2, dist_only=False)

I understand that DTW does take care of the "shifting". In addition, as can be seen from above, the mlpy.dtw_std() only takes in 2 1-D arrays. So I expect that no matter how I left/right shift my curves, the dist returned by the function should never change.

However after shifting my green curve a bit to the right, the dist returned by mlpy.dtw_std() changes!

Before shifting: Python mlpy.dwt_std reports dist = 14.014

After shifting: Python mlpy.dwt_std reports dist = 38.078 Obviously, since the curves are still those two curves, I don't expect the distances to be different!

Why is it so? Where went wrong?

回答1:

Let me reiterate what I have understood, please correct me if I am going wrong anywhere. I observe that in both your plots, your 1D series in blue is remaining identical, while green colored is getting stretched. How you are doing it, that you have explained it in the post on Sep 19 '13 at 9:36. Your premise is that because (1) DTW 'takes care' of time shift and (2) all that you are doing is stretching one time-series length-wise, not affecting y-values, (Inference:) you are expecting distance to remain the same.

There is a little missing link between [(1),(2)] and [(Inference)]. Which is, individual distance values corresponding to mappings WILL change as you change set of signals itself. And this will result into difference in the overall distance computation. Plot the warping paths, cost grid to see it for yourself.

Let's take an oversimplified case...

Let a=range(0,101,5) = [0,5,10,15...95, 100]

and b=range(0,101,5) = [0,5,10,15...95, 100].

Now intuitively speaking, you/I would expect one to one correspondence between 2 signals (for DTW mapping), and distance for all of the mappings to be 0, signals being identically looking.

Now if we make, b=range(0,101,4) = [0,4,8,12...96,100], DTW mapping between a and b still would start with a's 0 getting mapped to b's 0, and end at a's 100 getting mapped to b's 100 (boundary constraints). Also, because DTW 'takes care' of time shift, I would also expect 20's, 40's, 60's and 80's of the two signals to be mapped with one another. (I haven't tried DTWing these two myself, saying it from intuition, so please check. There is little possibility of non-intuitive warpings taking place as well, depending on step patterns allowed / global constraints, but let's go with intuitive warpings for the moment for the ease of understanding / sake of simplicity).

For the remaining data points, clearly, distances corresponding to mapping are now non-zero, therefore the overall distance too is non-zero. Our distance/overall cost value has changed from zero to something that is non-zero.

Now, this was the case when our signals were too simplistic, linearly increasing. Imagine the variabilities that will come into picture when you have real life non-monotonous signals, and need to find time-warping between them. :)

(PS: Please don't forget to upvote answer :D). Thanks.



回答2:

Obviously, the curves are not identical, and therefore the distance function must not be 0 (otherwise, it is not a distance by definition).

What IS "relatively large"? The distance probably is not infinite, is it?

140 points in time, each with a small delta, this still adds up to a non-zero number.

The distance "New York" to "Beijing" is roughly 11018 km. Or 1101800000 mm.

The distance to Alpha Centauri is small, just 4.34 lj. That is the nearest other stellar system to us...

Compare with the distance to a non-similar series; that distance should be much larger.