I am trying to calculate geodesic distance from a dataframe which consists of four columns of latitude and longitude data with around 3 million rows. I used the apply lambda method to do it but it took 18 minutes to finish the task. Is there a way to use Vectorization with NumPy arrays to speed up the calculation? Thank you for answering.
My code using apply and lambda method:
from geopy import distance
df['geo_dist'] = df.apply(lambda x: distance.distance(
(x['start_latitude'], x['start_longitude']),
(x['end_latitude'], x['end_longitude'])).miles, axis=1)
Updates:
I am trying this code but it gives me the error: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). Appreciate if anyone can help.
df['geo_dist'] = distance.distance(
(df['start_latitude'].values, df['start_longitude'].values),
(df['end_latitude'].values, df['end_longitude'].values)).miles
I think you might consider using
geopandas
for this, it's an extension of pandas (and thereforenumpy
designed to do these types of calculations very quickly.Specifically, it has a method for calculating the distance between sets of points in a
GeoSeries
, which can be a column of aGeoDataFrame
. I’m fairly certain that this method leveragesnumexpr
for vectorization.It should look something like this, where you convert your data frame to a
GeoDataFrame
with (at least) twoGeoSeries
columns that you can use for the origin and point destinations. This should return aGeoSeries
object:The answer to your question: You cannot do what you want to do with
geopy
. I am not familiar with this package but the error traceback shows that this function and possibly all other functions in this package were not written/designed with vectorized computations in mind.Now, if you can do with great-circle distances, then I would suggest that you experiment with
astropy.coordinates
package that my be able to computeseparations
between points in a vectorial way.Here is an example based on my answer to a different question: Finding closest point:
Then, distances between the two sets of points can be computed as:
Approximate conversion to distance:
Compare the first value with what you would get from the
geopy
's example:EDIT: Actually, quite possibly this may actually give you the geodesic distance that you are after but make sure to check the description of
EarthLocation
.