Two Issues about mlpy.dtw package in Python?

2019-08-04 23:02发布

As a newbie in Dynamic Time Warping (DTW), I find its Python implementation mlpy.dtw is not documented in a very detailed extend. I have some problems with its return value.

Regarding the returned value dist? I have two questions:

  • Any typo here? For standard DTW, the document says

Standard DTW as described in [Muller07], using the Euclidean distance (absolute value of the difference) or squared Euclidean distance (as in [Keogh01]) as local cost measure.

and for subsequence DTW, the document says

Subsequence DTW as described in [Muller07], assuming that the length of y is much larger than the length of x and using the Manhattan distance (absolute value of the difference) as local cost measure.

The same so-called "absolute value of the difference" corresponds two different distance metrics?

  • Total distance? After running the snippet

    dist, cost, path = mlpy.dtw_std(x, y, dist_only=False)

dist is one value. So is it the lumped sum of all the distances between each matched pair?

2条回答
别忘想泡老子
2楼-- · 2019-08-04 23:36

It seems to be an error in the documentation. Euclidean distance is not the "absolute value of the difference", it is the correct description of the Manhattan metric. Probably author was thinking about one dimension case, as in R both Euclidean and manhattan metrics are the same (and Euclidean metric really expresses the absolute value of the difference then). I am not familiar with the library, if it only operates on 1 dimensional objects, then there is no error and these two distance measures are equivalent

The dist value is the value of best time-warp (measured as the summarized costs of matching, see the algorithm definiton on wikipedia). So it is in fact the minimum edit distance between two sequences, where particular edits' costs are expressed in dissimilarity (distance) between "matched" objects

查看更多
Explosion°爆炸
3楼-- · 2019-08-04 23:54

Yes, the mlpy.dtw() function is not well documented.

  • First question: no typo here. As you can see in the documentation, euclidean, squared euclidean and manhattan distances concern the local cost measure. In this case the cost measure is defined as a distance between two real values (one dimension), see cost in the pseudocode in http://en.wikipedia.org/wiki/Dynamic_time_warping. So, in this case, Manhattan distance and Euclidean distance are the same (http://en.wikipedia.org/wiki/Euclidean_distance#One_dimension). Anyway, in the standard dtw, you can choose the euclidean distance (absolute value of the difference) or the squared euclidean distance (squared difference) by the parameter squared:
>>> import mlpy
>>> mlpy.dtw_std([1,2,3], [4,5,6], squared=False) # Euclidean distance
9.0
>>> mlpy.dtw_std([1,2,3], [4,5,6], squared=True) # Squared Euclidean distance
26.0

Cheers, Davide

查看更多
登录 后发表回答