Vectorize haversine distance computation along pat

2019-02-24 08:22发布

问题:

I have a list of coordinates and can calculate a distance matrix among all points using the haversine distance metric.

Coordinates come a as numpy.array of shape (n, 2) of (latitude, longitude) pairs:

[[  16.34576887 -107.90942116]
 [  12.49474931 -107.76030036]
 [  27.79461514 -107.98607881]
 ...
 [  12.90258404 -107.96786569]
 [  -6.29109889 -107.88681145]
 [  -2.68531605 -107.72796034]]

I can also extract the distance along the path implied by the sequence of coordinates like so:

coordinates = np.deg2rad(coordinates)
lat, lng = coordinates[:, 0], coordinates[:, 1]
diff_lat = lat[:, None] - lat
diff_lng = lng[:, None] - lng

d = np.sin(diff_lat / 2) ** 2 + np.cos(lat[:, None]) * np.cos(lat) * np.sin(diff_lng / 2) ** 2
dist_matrix = 2 * 6371 * np.arcsin(np.sqrt(d))
np.diagonal(dist_matrix, offset=1)

[   428.51472359   1701.42935402   1849.52714339  12707.47743385
  13723.9087041    4521.8250695    2134.258953      401.33113696
   4571.69119707     73.82631307   6078.48898641   9870.17140175
                               ...
   2109.57319898  12959.56540448  16680.64546196   3050.96912506
   3419.95053226   4209.71641445   9467.85523888   2805.65191129
   4120.18701177]

I would like to only calculate the distance vector as opposed to the entire matrix and then selecting the relevant diagonal.

回答1:

Here's one way you can vectorize that calculation without creating a big matrix. coslat is the array of cosines of the latitudes, and coslat[:-1]*coslat[1:] is the vectorized version of the expression cos(ϕ1)cos(ϕ2) in the Haversine formula.

from __future__ import division, print_function

import numpy as np


def hav(theta):
    return np.sin(theta/2)**2


coords = [[  16.34576887, -107.90942116],
          [  12.49474931, -107.76030036],
          [  27.79461514, -107.98607881],
          [  12.90258404, -107.96786569],
          [  -6.29109889, -107.88681145],
          [  -2.68531605, -107.72796034]]
r = 6371

coordinates = np.deg2rad(coords)
lat = coordinates[:, 0]
lng = coordinates[:, 1]
coslat = np.cos(lat)
t = hav(np.diff(lat)) + coslat[:-1]*coslat[1:]*hav(np.diff(lng))
d = 2*r*np.arcsin(np.sqrt(t))

print(d)

Output:

[  428.51472353  1701.42935412  1655.91938575  2134.25895299   401.33113696]