Efficient distance calculation between N points an

I just started using scipy/numpy. I have an 100000*3 array, each row is a coordinate, and a 1*3 center point. I want to calculate the distance for each row in the array to the center and store them in another array. What is the most efficient way to do it?

标签： python arrays numpy scipy

6条回答

聊天终结者

2楼-- · 2020-02-03 06:05

You may need to specify a more detailed manner the distance function you are interested of, but here is a very simple (and efficient) implementation of Squared Euclidean Distance based on inner product (which obviously can be generalized, straightforward manner, to other kind of distance measures):

In []: P, c= randn(5, 3), randn(1, 3)
In []: dot(((P- c)** 2), ones(3))
Out[]: array([  8.80512,   4.61693,   2.6002,   3.3293,  12.41800])

Where P are your points and c is the center.

0人赞添加讨论(0) 举报

Fickle 薄情

3楼-- · 2020-02-03 06:06

You can also use the development of the norm (similar to remarkable identities). This is probably the most efficent way to compute the distance of a matrix of points.

Here is a code snippet that I originally used for a k-Nearest-Neighbors implementation, in Octave, but you can easily adapt it to numpy since it only uses matrix multiplications (the equivalent is numpy.dot()):

% Computing the euclidian distance between each known point (Xapp) and unknown points (Xtest)
% Note: we use the development of the norm just like a remarkable identity:
% ||x1 - x2||^2 = ||x1||^2 + ||x2||^2 - 2*<x1,x2>
[napp, d] = size(Xapp);
[ntest, d] = size(Xtest);

A = sum(Xapp.^2, 2);
A = repmat(A, 1, ntest);

B = sum(Xtest.^2, 2);
B = repmat(B', napp, 1);

C = Xapp*Xtest';

dist = A+B-2.*C;

0人赞添加讨论(0) 举报

祖国的老花朵

4楼-- · 2020-02-03 06:12

I would take a look at scipy.spatial.distance.cdist:

http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html

import numpy as np
import scipy

a = np.random.normal(size=(10,3))
b = np.random.normal(size=(1,3))

dist = scipy.spatial.distance.cdist(a,b) # pick the appropriate distance metric

dist for the default distant metric is equivalent to:

np.sqrt(np.sum((a-b)**2,axis=1))

although cdist is much more efficient for large arrays (on my machine for your size problem, cdist is faster by a factor of ~35x).

0人赞添加讨论(0) 举报

Viruses.

5楼-- · 2020-02-03 06:13

This might not answer your question directly, but if you are after all permutations of particle pairs, I've found the following solution to be faster than the pdist function in some cases.

import numpy as np

L   = 100       # simulation box dimension
N   = 100       # Number of particles
dim = 2         # Dimensions

# Generate random positions of particles
r = (np.random.random(size=(N,dim))-0.5)*L

# uti is a list of two (1-D) numpy arrays  
# containing the indices of the upper triangular matrix
uti = np.triu_indices(100,k=1)        # k=1 eliminates diagonal indices

# uti[0] is i, and uti[1] is j from the previous example 
dr = r[uti[0]] - r[uti[1]]            # computes differences between particle positions
D = np.sqrt(np.sum(dr*dr, axis=1))    # computes distances; D is a 4950 x 1 np array

See this for a more in-depth look on this matter, on my blog post.

0人赞添加讨论(0) 举报

我只想做你的唯一

6楼-- · 2020-02-03 06:25

I would use the sklearn implementation of the euclidean distance. The advantage is the usage of the more efficient expression by using Matrix multiplication:

dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y)

A simple script would look like this:

import numpy as np

x = np.random.rand(1000, 3)
y = np.random.rand(1000, 3)

dist = np.sqrt(np.dot(x, x)) - (dot(x, y) + dot(x, y)) + dot(y, y)

The advantage of this approach has been nicely described in the sklearn documentation: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html#sklearn.metrics.pairwise.euclidean_distances

I am using this approach to crunch large datamatrices (10000, 10000) with some minor modifications like using the np.einsum function.

0人赞添加讨论(0) 举报

够拽才男人

7楼-- · 2020-02-03 06:28

#is it true, to find the biggest distance between the points in surface?

from math import sqrt

n = int(input( "enter the range : "))
x = list(map(float,input("type x coordinates: ").split()))
y = list(map(float,input("type y coordinates: ").split()))
maxdis = 0  
for i in range(n):
    for j in range(n):
        print(i, j, x[i], x[j], y[i], y[j])
        dist = sqrt((x[j]-x[i])**2+(y[j]-y[i])**2)
        if maxdis < dist:

            maxdis = dist
print(" maximum distance is : {:5g}".format(maxdis))

0人赞添加讨论(0) 举报

Efficient distance calculation between N points an

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间