I working on a project at the moment, where I have a point feature -- the point feature includes a 142 points -- and multiple polygon (around 10). I want to calculate the distance between every single point and the nearest polygon feature in R.
My current approach is tedious and a bit long winded. I am currently planning to calculate the distance between every single point and every single polygon. For example, I would calculate the distance between the 142 points and Polygon A, the distance between the 142 points and Polygon B, the distance between 142 points and Polygon C, etc. Here is a sample code of one of these distance calculations:
dist_cen_polya <- dist2Line(centroids_coor, polygonA_shp)
After doing these calculations, I would write a code to choose the minimum/nearest distance between every single point and the closest polygon. The issue is that this procedure is tedious.
Does anyone know a package/code which would minimize the effort/computational time of the calculation? I would really like to use a package that compare a single to point to the nearest polygon feature or calculates the distance between a point and all polygons of interest?
Thank you.
Here I am using the gDistance function in the rgeos topology library. I am using a brute force double loop but it is surprisingly fast. It takes less than 2 seconds for 142 points and 10 polygons. I am sure that there is a more elgant way to perform the looping.
require(rgeos)
# CREATE SOME DATA USING meuse DATASET
data(meuse)
coordinates(meuse) <- ~x+y
pts <- meuse[sample(1:dim(meuse)[1],142),]
data(meuse.grid)
coordinates(meuse.grid) = c("x", "y")
gridded(meuse.grid) <- TRUE
meuse.grid[["idist"]] = 1 - meuse.grid[["dist"]]
polys <- as(meuse.grid, "SpatialPolygonsDataFrame")
polys <- polys[sample(1:dim(polys)[1],10),]
plot(polys)
plot(pts,pch=19,cex=1.25,add=TRUE)
# LOOP USING gDistance, DISTANCES STORED IN LIST OBJECT
Fdist <- list()
for(i in 1:dim(pts)[1]) {
pDist <- vector()
for(j in 1:dim(polys)[1]) {
pDist <- append(pDist, gDistance(pts[i,],polys[j,]))
}
Fdist[[i]] <- pDist
}
# RETURN POLYGON (NUMBER) WITH THE SMALLEST DISTANCE FOR EACH POINT
( min.dist <- unlist(lapply(Fdist, FUN=function(x) which(x == min(x))[1])) )
# RETURN DISTANCE TO NEAREST POLYGON
( PolyDist <- unlist(lapply(Fdist, FUN=function(x) min(x)[1])) )
# CREATE POLYGON-ID AND MINIMUM DISTANCE COLUMNS IN POINT FEATURE CLASS
pts@data <- data.frame(pts@data, PolyID=min.dist, PDist=PolyDist)
# PLOT RESULTS
require(classInt)
( cuts <- classIntervals(pts@data$PDist, 10, style="quantile") )
plotclr <- colorRampPalette(c("cyan", "yellow", "red"))( 20 )
colcode <- findColours(cuts, plotclr)
plot(polys,col="black")
plot(pts, col=colcode, pch=19, add=TRUE)
The min.dist vector represents the row number of the polygon. For instance you could subset the nearest polygons by using this vector as such.
near.polys <- polys[unique(min.dist),]
The PolyDist vector contain the actual Cartesian minimum distances in the projection units of the features.
In a polygon you have quite a lot of lines. The distance between a polygon is either zero, if the point lies within the polygon or lies on an edge.
So actually you are looking for two cases:
- Check if the point lies inside of any polygon (remember it might be more then a single polygon
- Get a collection of all edges and calculate each distance of the point from the edge.
The nearest distance gives you the distance of the polygon the edge belongs too.
So this is a simple algorithm that if we consider 10 edges per polygon takes O(10 * 10) * 142 for all of your points. That makes 100 * 142 = 14200 operations. => O(m * deltaE) * n (m is number of polygones, deltaE is the average number of edges per polygon, n is the number of points).
Now we check if we can speed this up. The first thing comming to mind is that we can use bounding box checks or bounding circles for each polygon.
Another idea is to prepare the nearest edges for each polygon for a set of angles. For example if you have 8 angles (every 45°) you can remove all edges from the list that are superseeded by another edge (therefore any point of the removed edge will yield always a greater distance than any point of the other edge of the same polygon.
This way typically you can might reduce the complexity in great margin. If you think of a rectangle you only have one or two edges instead of 4. If you think of a regular 8 edge polygon you might end up with typically one or two and at max 3 edges per polygon.
If you add the normal vector to each edge you can calculate if the point might be inside and you have to perform a scanline or whatever check (or you know its konvex) to be sure.
There are also mapping indexes possible like separating the two dimensional space by x and y in a equidistant mannor. Thisway you only have to test polygones lying in 9 sectors.
The next version might be using an R-tree that each bounding box (circle) of each node has to be checked for the minimum expected distance and the maximum expected distance. So one does not need to check the polygons of a node that result in far greater minimum distances than the maximum distances of another node.
Another thing is if you have a given tree like order like you have with map data. In a street map you always have world -> region -> country -> county -> city -> city sector ->...
Thisway you can search for the nearest location in a map of the whole world containing millions of polygons in a reasonable time mostly <10ms.
So to speak you have a lot of options here. And preprocessing your polygon list and extract relevant edges either by using binary space partition trees of polygons or use an angular approach or even go with something even more fancier. Its up to you. I expect that you end up by doing something in the logarithmic range like O(log(n) * log(deltaE)) becoming O(log(n)) as an average complexity.