I have two 2D np.arrays
let's call them A
and B
, both having the shape. For every vector in 2D array A
I need to find the vector in matrix B
, that have the minimum cosine distance. To do this I just have a double for loop inside of which I try to find the minimum value. So basically I do the following:
from scipy.spatial.distance import cosine
l, res = A.shape[0], []
for i in xrange(l):
minimum = min((cosine(A[i], B[j]), j) for j in xrange(l))
res.append(minimum[1])
In the code above one of the loop is hidden behind a comprehension. Everything works fine, but the double for loop makes it too slow (I tried to rewrite it with a double comprehension, which made things a little bit faster, but still slow).
I believe that there is a numpy function that can achieve the following faster (using some linear-algebra).
So is there a way to achieve what I want faster?
From the
cosine docs
we have the following info -scipy.spatial.distance.cosine(u, v) : Computes the Cosine distance between 1-D arrays.
The Cosine distance between
u
andv
, is defined aswhere
u⋅v
is the dot product ofu
andv
.Using the above formula, we would have one vectorized solution using `NumPy's broadcasting capability, like so -
Runtime tests -
Verify results -
Using
scipy.spatial.distance.cdist
:It gets the same results as Divakar's answer:
But it's not as fast: