Two Issues about mlpy.dtw package in Python?

As a newbie in Dynamic Time Warping (DTW), I find its Python implementation mlpy.dtw is not documented in a very detailed extend. I have some problems with its return value.

Regarding the returned value dist? I have two questions:

Any typo here? For standard DTW, the document says

Standard DTW as described in [Muller07], using the Euclidean distance (absolute value of the difference) or squared Euclidean distance (as in [Keogh01]) as local cost measure.

and for subsequence DTW, the document says

Subsequence DTW as described in [Muller07], assuming that the length of y is much larger than the length of x and using the Manhattan distance (absolute value of the difference) as local cost measure.

The same so-called "absolute value of the difference" corresponds two different distance metrics?

Total distance? After running the snippet

dist, cost, path = mlpy.dtw_std(x, y, dist_only=False)

dist is one value. So is it the lumped sum of all the distances between each matched pair?

标签： python machine-learning

2条回答

别忘想泡老子

2楼-- · 2019-08-04 23:36

It seems to be an error in the documentation. Euclidean distance is not the "absolute value of the difference", it is the correct description of the Manhattan metric. Probably author was thinking about one dimension case, as in R both Euclidean and manhattan metrics are the same (and Euclidean metric really expresses the absolute value of the difference then). I am not familiar with the library, if it only operates on 1 dimensional objects, then there is no error and these two distance measures are equivalent

The dist value is the value of best time-warp (measured as the summarized costs of matching, see the algorithm definiton on wikipedia). So it is in fact the minimum edit distance between two sequences, where particular edits' costs are expressed in dissimilarity (distance) between "matched" objects

0人赞添加讨论(0) 举报

Explosion°爆炸

3楼-- · 2019-08-04 23:54

Yes, the mlpy.dtw() function is not well documented.

First question: no typo here. As you can see in the documentation, euclidean, squared euclidean and manhattan distances concern the local cost measure. In this case the cost measure is defined as a distance between two real values (one dimension), see cost in the pseudocode in http://en.wikipedia.org/wiki/Dynamic_time_warping. So, in this case, Manhattan distance and Euclidean distance are the same (http://en.wikipedia.org/wiki/Euclidean_distance#One_dimension). Anyway, in the standard dtw, you can choose the euclidean distance (absolute value of the difference) or the squared euclidean distance (squared difference) by the parameter squared:

>>> import mlpy
>>> mlpy.dtw_std([1,2,3], [4,5,6], squared=False) # Euclidean distance
9.0
>>> mlpy.dtw_std([1,2,3], [4,5,6], squared=True) # Squared Euclidean distance
26.0

Second question: dist is the unnormalized minimum-distance warp path between time series x and y. It is the unnormalized DTW distance. You can normalize it dividing by len(X)+len(Y). See http://www.irit.fr/~Julien.Pinquier/Docs/TP_MABS/res/dtw-sakoe-chiba78.pdf

Cheers, Davide

0人赞添加讨论(0) 举报

Two Issues about mlpy.dtw package in Python?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间