Use pdist() in python with a custom distance funct

2020-07-07 04:53发布

问题:

I have been interested in usage of scipy.spatial.distance.pdist(...) in python which has come to be useful and fast for some of the applications I have been working on.

I need to use a pairwise distance function which are custom and not standard default distance metrics as defined by the metric. Let's make a simple example, suppose I do not want to use euclidean distance function as the following:

 Y = pdist(X, 'euclidean')

Instead I want to define the euclidean function myself and pass it as a function or argument to pdist(). How can I pass the implementation of euclidean distance function to this function to get exactly the same results. The answer to this question, will help me to use the function in the way I am interested in.

In MATLAB, I know how to use pdist(), in Python I don't yet. Thanks for your suggestion

回答1:

There is an example in the documentation for pdist:

import numpy as np
from scipy.spatial.distance import pdist

dm = pdist(X, lambda u, v: np.sqrt(((u-v)**2).sum()))

If you want to use a regular function instead of a lambda function the equivalent would be

import numpy as np
from scipy.spatial.distance import pdist

def dfun(u, v):
    return np.sqrt(((u-v)**2).sum())

dm = pdist(X, dfun)