my question is about use of pdist function of scipy.spatial.distance. Although I have to calculate the hamming distances between a 1x64 vector with each and every one of other millions of 1x64 vectors that are stored in a 2D-array, I cannot do it with pdist. Because it returns hamming distances between any two vector inside the same 2D array. I wonder if there is any way to make it calculate hamming distances between a specific index' vector and all others each.
Here is my current code, I use 1000x64 for now because memory error shows up with big arrays.
import numpy as np
from scipy.spatial.distance import pdist
ph = np.load('little.npy')
print pdist(ph, 'hamming').shape
and the output is
(499500,)
little.npy has a 1000x64 array. For example, if I want only to see the hamming distances with 31. vector and all others. What should I do?
You can use
cdist
. For example,That gives the Hamming distance between the row at index 3 and all the other rows (including the row at index 3). The result is a 2D array, with a single row. You might want to immediately pull out that row so the result is 1D:
I used
x[index:index+1]
instead of justx[index]
so that input is a 2D array (with just a single row):You'll get an error if you use
x[index]
.