Calculate total miles traveled from vectors of lat

2019-01-15 20:08发布

问题:

I have a data frame with data about a driver and the route they followed. I'm trying to figure out the total mileage traveled. I'm using the geosphere package but can't figure out the correct way to apply it and get an answer in miles.

> head(df1)
  id       routeDateTime driverId      lat       lon
1  1 2012-11-12 02:08:41      123 76.57169 -110.8070
2  2 2012-11-12 02:09:41      123 76.44325 -110.7525
3  3 2012-11-12 02:10:41      123 76.90897 -110.8613
4  4 2012-11-12 03:18:41      123 76.11152 -110.2037
5  5 2012-11-12 03:19:41      123 76.29013 -110.3838
6  6 2012-11-12 03:20:41      123 76.15544 -110.4506

so far I've tried

spDists(cbind(df1$lon,df1$lat))

and several other functions but can't seem to get a reasonable answer.

Any suggestions?

> dput(df1)
structure(list(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40), routeDateTime = c("2012-11-12 02:08:41", 
"2012-11-12 02:09:41", "2012-11-12 02:10:41", "2012-11-12 03:18:41", 
"2012-11-12 03:19:41", "2012-11-12 03:20:41", "2012-11-12 03:21:41", 
"2012-11-12 12:08:41", "2012-11-12 12:09:41", "2012-11-12 12:10:41", 
"2012-11-12 02:08:41", "2012-11-12 02:09:41", "2012-11-12 02:10:41", 
"2012-11-12 03:18:41", "2012-11-12 03:19:41", "2012-11-12 03:20:41", 
"2012-11-12 03:21:41", "2012-11-12 12:08:41", "2012-11-12 12:09:41", 
"2012-11-12 12:10:41", "2012-11-12 02:08:41", "2012-11-12 02:09:41", 
"2012-11-12 02:10:41", "2012-11-12 03:18:41", "2012-11-12 03:19:41", 
"2012-11-12 03:20:41", "2012-11-12 03:21:41", "2012-11-12 12:08:41", 
"2012-11-12 12:09:41", "2012-11-12 12:10:41", "2012-11-12 02:08:41", 
"2012-11-12 02:09:41", "2012-11-12 02:10:41", "2012-11-12 03:18:41", 
"2012-11-12 03:19:41", "2012-11-12 03:20:41", "2012-11-12 03:21:41", 
"2012-11-12 12:08:41", "2012-11-12 12:09:41", "2012-11-12 12:10:41"
), driverId = c(123, 123, 123, 123, 123, 123, 123, 123, 123, 
123, 456, 456, 456, 456, 456, 456, 456, 456, 456, 456, 789, 789, 
789, 789, 789, 789, 789, 789, 789, 789, 246, 246, 246, 246, 246, 
246, 246, 246, 246, 246), lat = c(76.5716897079255, 76.4432530414779, 
76.9089707506355, 76.1115217276383, 76.2901271982118, 76.155437662499, 
76.4115052509587, 76.8397977722343, 76.3357809444424, 76.032417796785, 
76.5716897079255, 76.4432530414779, 76.9089707506355, 76.1115217276383, 
76.2901271982118, 76.155437662499, 76.4115052509587, 76.8397977722343, 
76.3357809444424, 76.032417796785, 76.5716897079255, 76.4432530414779, 
76.9089707506355, 76.1115217276383, 76.2901271982118, 76.155437662499, 
76.4115052509587, 76.8397977722343, 76.3357809444424, 76.032417796785, 
76.5716897079255, 76.4432530414779, 76.9089707506355, 76.1115217276383, 
76.2901271982118, 76.155437662499, 76.4115052509587, 76.8397977722343, 
76.3357809444424, 76.032417796785), lon = c(-110.80701574916, 
-110.75247172825, -110.861284852726, -110.203674311982, -110.383751512505, 
-110.450569844106, -110.22185564111, -110.556956546381, -110.24483308522, 
-110.217355202651, -110.80701574916, -110.75247172825, -110.861284852726, 
-110.203674311982, -110.383751512505, -110.450569844106, -110.22185564111, 
-110.556956546381, -110.24483308522, -110.217355202651, -110.80701574916, 
-110.75247172825, -110.861284852726, -110.203674311982, -110.383751512505, 
-110.450569844106, -110.22185564111, -110.556956546381, -110.24483308522, 
-110.217355202651, -110.80701574916, -110.75247172825, -110.861284852726, 
-110.203674311982, -110.383751512505, -110.450569844106, -110.22185564111, 
-110.556956546381, -110.24483308522, -110.217355202651)), .Names = c("id", 
"routeDateTime", "driverId", "lat", "lon"), row.names = c(NA, 
-40L), class = "data.frame")

回答1:

How about this?

## Setup
library(geosphere)
metersPerMile <- 1609.34
pts <- df1[c("lon", "lat")]

## Pass in two derived data.frames that are lagged by one point
segDists <- distVincentyEllipsoid(p1 = pts[-nrow(df),], 
                                  p2 = pts[-1,])
sum(segDists)/metersPerMile
# [1] 1013.919

(To use one of the faster distance calculation algorithms, just substitute distCosine, distVincentySphere, or distHaversine for distVincentyEllipsoid in the call above.)



回答2:

Be VERY careful with missing data, as distVincentyEllipsoid() returns 0 for distance between any two points with missing coordinates c(NA, NA), c(NA, NA).



回答3:

library(geodist)
geodist(df, sequential = TRUE, measure = "geodesic") # sequence of distance increments
sum(geodist(df, sequential = TRUE, measure = "geodesic")) # total distance in metres
sum(geodist(df, sequential = TRUE, measure = "geodesic")) * 0.00062137 # total distance in miles

Geodesic distances are necessary here because of the long distances involved. The result is 1013.915, slightly different from less-accurate Vincenty distances of geosphere. Street network distances can also be calculated with

library(dodgr)
dodgr_dists(from = df)

... but there has to be a street network, which is not the case for (lat = 76, lon = -110). Where there is a street network, that will by default give you all pair-wise distances routed through the street network , from which the sequential increments are the off-diagonal.



标签: r geolocation