Different results and performances with different

2019-03-01 05:36发布

问题:

I'm comparing the libraries dtaidistance, fastdtw and cdtw for DTW computations. This is my code:

from fastdtw import fastdtw
from cdtw import pydtw
import fastdtw
import array
from timeit import default_timer as timer
from dtaidistance import dtw, dtw_visualisation as dtwvis

s1 = mySampleSequences[0] # first sample sequence consisting of 3000 samples
s2 = mySampleSequences[1] # second sample sequence consisting of 3000 samples

start = timer()
distance1 = dtw.distance(s1, s2)
end = timer()
start2 = timer()
distance2 = dtw.distance_fast(array.array('d',s1),array.array('d',s2))
end2 = timer()
start3 = timer()
distance3, path3 = fastdtw(s1,s2)
end3 = timer()
start4 = timer()
distance4 = pydtw.dtw(s1,s2).get_dist()
end4 = timer()

print("dtw.distance(x,y) time: "+ str(end - start))
print("dtw.distance(x,y) distance: "+str(distance1))
print("dtw.distance_fast(x,y) time: "+ str(end2 - start2))
print("dtw.distance_fast(x,y) distance: " + str(distance2))
print("fastdtw(x,y) time: "+ str(end3 - start3))
print("fastdtw(x,y) distance: " + str(distance3))
print("pydtw.dtw(x,y) time: "+ str(end4 - start4))
print("pydtw.dtw(x,y) distance: " + str(distance4))

This is the output I get:

  • dtw.distance(x,y) time: 22.16925272245262
  • dtw.distance(x,y) distance: 1888.8583853746156
  • dtw.distance_fast(x,y) time: 0.3889036471839056
  • dtw.distance_fast(x,y) distance: 1888.8583853746156
  • fastdtw(x,y) time: 0.23296659641047412
  • fastdtw(x,y) distance: 27238.0
  • pydtw.dtw(x,y) time: 0.13706478039556558
  • pydtw.dtw(x,y) distance: 17330.0

My question is: Why do I get different performances and different distances? Thank you very much for your comments.

// edit: The unit of the time measurements is seconds.

回答1:

Edit: what are the units of the time measurements? I believe that you compared them as they were all in the same unit. Probably the dtw.distance is, for example, in microseconds, while the other answers are in milliseconds, and you thought that dtw.distance performed slower, when it is actually the opposite.

There are different methodologies to measure the distance between two points. It could be based on standard deviation or just euclidian distance. Here is a list of many of those distance.

Some of them might be more computational intensive than others, and also have different meanings. Fast dtw, for example, uses as a third input the type of distance that you want, as described on their github

distance3, path3 = fastdtw(s1, s2, dist = euclidean)

Another reason for the speed difference is the underlying code. Some of them are in pure python, while others are in C, which can be easily 100x faster. A way to speed up your dtaidistance is to set a maximum distance threshold. The algorithm will stop the calculation if it realizes that the total distance will be above a certain value:

distance2 = dtw.distance_fast(array.array('d',s1),array.array('d',s2), max_dist = your_threshold)

It is also important to note that some might be optimized for longer or shorter arrays. Looking at the example below and running it in my computer, I find different results:

from cdtw import pydtw
from dtaidistance import dtw
from fastdtw import fastdtw
from scipy.spatial.distance import euclidean
s1=np.array([1,2,3,4],dtype=np.double)
s2=np.array([4,3,2,1],dtype=np.double)

%timeit dtw.distance_fast(s1, s2)
4.1 µs ± 28.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit d2 = pydtw.dtw(s1,s2,pydtw.Settings(step = 'p0sym', window = 'palival', param = 2.0, norm = False, compute_path = True)).get_dist()
45.6 µs ± 3.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit d3,_=fastdtw(s1, s2, dist=euclidean)
901 µs ± 9.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

fastdtw is 219 times slower than dtaidistance lib and 20x slower than cdtw



标签: python dtw