how to group data by LatLong distance in R

2019-04-17 04:37发布

问题:

I have a function distance(lat1,lon1, lat2,lon2) that calculates the distance of 2 points.

Suppose I have a dataframe with some points and values:

n<-c(lon = -1.729219, lat = 29.730836)
o<-c(lon = -5.041928, lat = 28.453022)
e<-c(lon = -2.700067, lat = 29.198922)
s<-c(lon = -5.212864, lat = 28.531739)
centro<-matrix(c(n,o,e,s), ncol=2, byrow=TRUE)
d<-data.frame(c=centro, amount=c(3.5,3.5,3.5,3.5), count=c(12,12,12,12))
colnames(d)<-c('lon','lat','amount','count')

I want to get a a new frameset with the values aggregated to the closest one of them (I don't care wich)

Suppose I have a rad of 10km and n and o are at a distance of 7 and e and s are at distance 20 from any other point I would expect a new data frame with 3 values: e, s and a new value with amount and count the sum of the other 2 and lat and long either the ones from n or the ones from o.

I suppose there's a simple way to do this in R but I couldn't find it.

Thanks

回答1:

I suppose that if you have the distances between the points you could use hclust to cluster the points. Then use cutree and set the h argument to cut the groups at the desired distance. You can use the groups to make the aggregation.

Maybe something like this (I don't know if the output is correct, but using those coordinates it gives you distances in order of hundreds of km)

#Calculate the distances and name them
distance <- (distm(centro))
row.names(distance) <- c("n", "o", "e", "s")
colnames(distance) <- c("n", "o", "e", "s")
#Use agnes function because it accepts a matrix
#And convert it to hclust objet to use cutree
library(cluster)
clusters <- as.hclust(agnes(distance, diss = T))
d$group <- cutree(clusters, h = 210000)
#Finally use plyr to agregate
library(plyr)
ddply(d, .(group), 
      function(x) data.frame(lon = x$lon[1], lat = x$lat[1], 
                             amount = sum(x$amount), count = sum(x$count)))

HTH



回答2:

To calculate distances between geographic coordinates you can use the spDists function from the sp package. From the documentation:

spDists returns a full matrix of distances in the metric of the points if longlat=FALSE, or in kilometers if longlat=TRUE; it uses spDistsN1 in case points are two-dimensional. In case of spDists(x,x), it will compute all n x n distances, not the sufficient n x (n-1)

Note that this function will only work if your objects are represented by the spatial classes provided by the sp-package (SpatialPointsDataFrame prob in your case). A small R example:

library(sp)
data(meuse)
# Convert the data.frame meuse to SpatialPointsDataFrame
coordinates(meuse) = c("x","y")
spDists(meuse)

Note that in your case you want the set the input argument longlat of the spDists function equal to TRUE to obtain great circle distances. This function probably works fine for not too large datasets. For large datasets it could be slower. If your really need something quick, you could take a look at Rcpp to write the loop in C++.