I just started using scipy/numpy. I have an 100000*3 array, each row is a coordinate, and a 1*3 center point. I want to calculate the distance for each row in the array to the center and store them in another array. What is the most efficient way to do it?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
You may need to specify a more detailed manner the distance function you are interested of, but here is a very simple (and efficient) implementation of Squared Euclidean Distance based on
inner product
(which obviously can be generalized, straightforward manner, to other kind of distance measures):Where
P
are your points andc
is the center.You can also use the development of the norm (similar to remarkable identities). This is probably the most efficent way to compute the distance of a matrix of points.
Here is a code snippet that I originally used for a k-Nearest-Neighbors implementation, in Octave, but you can easily adapt it to numpy since it only uses matrix multiplications (the equivalent is numpy.dot()):
I would take a look at
scipy.spatial.distance.cdist
:http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html
dist
for the default distant metric is equivalent to:although
cdist
is much more efficient for large arrays (on my machine for your size problem,cdist
is faster by a factor of ~35x).This might not answer your question directly, but if you are after all permutations of particle pairs, I've found the following solution to be faster than the pdist function in some cases.
See this for a more in-depth look on this matter, on my blog post.
I would use the sklearn implementation of the euclidean distance. The advantage is the usage of the more efficient expression by using Matrix multiplication:
A simple script would look like this:
The advantage of this approach has been nicely described in the sklearn documentation: http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.euclidean_distances.html#sklearn.metrics.pairwise.euclidean_distances
I am using this approach to crunch large datamatrices (10000, 10000) with some minor modifications like using the np.einsum function.