I'm comparing the libraries dtaidistance, fastdtw and cdtw for DTW computations. This is my code:
from fastdtw import fastdtw
from cdtw import pydtw
import fastdtw
import array
from timeit import default_timer as timer
from dtaidistance import dtw, dtw_visualisation as dtwvis
s1 = mySampleSequences[0] # first sample sequence consisting of 3000 samples
s2 = mySampleSequences[1] # second sample sequence consisting of 3000 samples
start = timer()
distance1 = dtw.distance(s1, s2)
end = timer()
start2 = timer()
distance2 = dtw.distance_fast(array.array('d',s1),array.array('d',s2))
end2 = timer()
start3 = timer()
distance3, path3 = fastdtw(s1,s2)
end3 = timer()
start4 = timer()
distance4 = pydtw.dtw(s1,s2).get_dist()
end4 = timer()
print("dtw.distance(x,y) time: "+ str(end - start))
print("dtw.distance(x,y) distance: "+str(distance1))
print("dtw.distance_fast(x,y) time: "+ str(end2 - start2))
print("dtw.distance_fast(x,y) distance: " + str(distance2))
print("fastdtw(x,y) time: "+ str(end3 - start3))
print("fastdtw(x,y) distance: " + str(distance3))
print("pydtw.dtw(x,y) time: "+ str(end4 - start4))
print("pydtw.dtw(x,y) distance: " + str(distance4))
This is the output I get:
- dtw.distance(x,y) time: 22.16925272245262
- dtw.distance(x,y) distance: 1888.8583853746156
- dtw.distance_fast(x,y) time: 0.3889036471839056
- dtw.distance_fast(x,y) distance: 1888.8583853746156
- fastdtw(x,y) time: 0.23296659641047412
- fastdtw(x,y) distance: 27238.0
- pydtw.dtw(x,y) time: 0.13706478039556558
- pydtw.dtw(x,y) distance: 17330.0
My question is: Why do I get different performances and different distances? Thank you very much for your comments.
// edit: The unit of the time measurements is seconds.
Edit: what are the units of the time measurements? I believe that you compared them as they were all in the same unit. Probably the dtw.distance is, for example, in microseconds, while the other answers are in milliseconds, and you thought that dtw.distance performed slower, when it is actually the opposite.
There are different methodologies to measure the distance between two points. It could be based on standard deviation or just euclidian distance. Here is a list of many of those distance.
Some of them might be more computational intensive than others, and also have different meanings. Fast dtw, for example, uses as a third input the type of distance that you want, as described on their github
Another reason for the speed difference is the underlying code. Some of them are in pure python, while others are in C, which can be easily 100x faster. A way to speed up your dtaidistance is to set a maximum distance threshold. The algorithm will stop the calculation if it realizes that the total distance will be above a certain value:
It is also important to note that some might be optimized for longer or shorter arrays. Looking at the example below and running it in my computer, I find different results:
fastdtw
is 219 times slower thandtaidistance
lib and 20x slower thancdtw