I am new to R and for my currently project, I have to draw a heat map related to a specific event. There are around 2 million observations of such event and in each observation there is a long and lat coordinate. Also, I have converted the map data to a data frame and the data frame contains 71 district, each district is defined with a set of coordinates. I need to decide which observation of the event belongs to which district. I am using the following code:
for (row in 1:nrow(data2015)){
point.x=data2015[row,"Latitude"]
point.y=data2015[row,"Longitude"]
for (name in names(polygonOfdis)){
if (point.in.polygon(point.x, point.y, polygonOfdis[[name]]$lat, polygonOfdis[[name]]$long, mode.checked=FALSE)){
count[[name]]<-count[[name]]+1
break
}
}
}
data2015 is the data set for the event, polygonOfdis is the data set for each district.
For small data set, this algorithm works okay but for my data set, it will definitely run more than ten hours or even more (For a data set only 1/400 of current size, this algorithm runs for 1 to 2 minutes). I am wondering if there is any better way to find out which observation belongs to which district? My problem is that the point.in.polygon function takes too much time and I am wondering if there is any other function can do this?
PS: The current data is actual only 1/10 of the real data I have to process, so I really really need a faster way to do this.
This function from the
SMDTools
package worked well.Based on @conner-m suggestion:
Your code is pretty straight forward, your stumbling block is using loops instead of the R's vectorization power. This code should work, but without any data I can't verify it:
This code also assumes that each point is in one and only 1 polygon. The inner loop and the tapply could most likely improved by using the dplyr library. The other listed solution with the PIP Algorithm could provide a boost over the built-in method.
So, awhile ago, I ported over a point in a polygon algorithm by W. Randolph Franklin that uses the notion of rays. I.e. If a point is in the polygon, it should pass through an odd number of times. Otherwise, when it has an even number, it should lie on the outside of the polygon.
The code is considerably fast because it is written using Rcpp. It is split into two parts: 1. The PIP Algorithm and 2. A wrapper function for classification.
PIP Algorithm
Classification Algorithm
There's a package for that, namely
ptinpoly
.Note that you can test several points (see below), but if you test a single one you need a matrix, that's why I use
rbind
.You get
0
if the point is inside the polygon,-1
otherwise:As I said before you can simultaneously test multiple points:
The package also allows to test for point containment in a 3D polyhedron.