I am using gIntersection to clip a nationwide path network by polygons one at a time from a SpatialPolygonsDataFrame. I am looping through each polygon, clipping the path network, calculating the length of the clipped paths, and saving this to a dataframe called path.lgth:
poly<-readShapePoly("C:\\temp\\polygons.shp")
paths<-readShapeLines("C:\\temp\\paths.shp")
#loop through all polygons clipping lines
path.lgth<-data.frame()
for (i in 1:length(poly)){
clip<-gIntersection(paths,poly[i,])
lgth<-gLength(clip)
vid<-poly@data[i,3]
data<-cbind(vid,lgth)
path.lgth<-rbind(path.lgth,data)
print(i)
}
The vid line just extracts the polygon ID to save in the dataframe with the path length.
My problem is that it takes way too long to do the first polygon (around 12 minutes!). Is there a way to speed this up? I'm not sure what gIntersection does mathematically (is it checking all paths to see if they overlay with the polygon?). I have simplified my paths so they are only one feature.
Thanks for your help.
If I understood you correctly, you have N polygons, and M paths, right? And for each polygon you want the sum of the paths, right?
Solution 1
Then, first merge all the lines into one feature. Then make intersection at once using
byid = TRUE
. This way you get rid of the loop:You should get the lengths marked by id of the polygons. I am not sure if the ids of the polygons there will be correct - check if they are correct in the
path.crop
. If not, you need to set theid
parameter ofgIntersection
to the ids of the polygons.Solution 2
I am not sure if
sp::over
can be used to make a clever query? This is worth some examining.since I was facing the same kind of issue, here's my workaround to reduce processing time. (it's a 4 years-old question! I hope that some people -like me- are still facing such problems?)
I'd advise to first select only the line features that are involved in each gIntersection step with the close function 'gIntersects', which returns a logical and which is much faster than gIntersection!
Therefore your code would be like that:
Worth to give a try on a real dataset to confirm that the output matches your need.
The first things to do are to avoid reallocating memory on each pass thru the loop.
Instead of
path.lgth<-rbind(path.lgth,data)
, initialize prior to the loop:Then inside the loop, dump the
cbind
(gross overkill) and doNow as to overall execution time - you didn't say anything about what CPU (and RAM) you have available. But check out
Rprof
to get an idea which steps are taking most of your time.