Given the following code that computes distances between vectors in list 'vect’:
import numpy as np
vect=([0.123, 0.345, 0.789], [0.234, 0.456, 0.567],[0.134, 0.246, 0.831])
def kn():
for j in vect:
c=np.array(j)
for z in vect:
p=np.array(z)
space = np.linalg.norm(c-p)
print space
kn()
Is there a way to avoid the quadratic complexity that will result from the double ‘for loop’?
Computation as a result of the double for loop (quadratic) ==3X3=9
Ideal computation (what I want) should be (3):
[0.123, 0.345, 0.789] and [0.234, 0.456, 0.567] =
[0.123, 0.345, 0.789] and [0.134, 0.246, 0.831] =
[0.234, 0.456, 0.567] and [0.134, 0.246, 0.831] =
Thanks in advance for your suggestion(s).
To avoid duplicate pairs nested loop should go upwards from the index of the outer loop, i.e.:
Or use a solution provided by the standard library:
Others have noted that you can't avoid the quadratic complexity. But if your concern is performance, you can speed things up considerably by using
scipy.spatial.distance.pdist
, which will have all the looping and function calling happen in C, not Python. The following code returns a square, symmetrical array, but it only makesn *(n-1)/2
calculations:If you want a flattened array with only the unique, off-diagonal values, you can use
scipy.spatial.distance.squareform
:In your case,
pairwise_distances
would be a 3 element long array.