Sometimes, when working with SpatialPolygons in R with the sp and rgeos packages, I run into problems when subsetting and dissolving these polygons.
This example shows the types of problems I have had, though I've had it happen with other datasets too.
require(maptools)
require(rgeos)
data(wrld_simpl)
countries <- c("Argentina","Bolivia","Brazil","Chile","Colombia","Ecuador","Guyana","Paraguay","Peru","Suriname","Uruguay","Venezuela")
SAmerica <- subset(wrld_simpl,wrld_simpl@data$NAME == countries[1])
for (i in 2:length(countries)) {
x <- subset(wrld_simpl, wrld_simpl@data$NAME == countries[i])
SAmerica <- gUnion(SAmerica, x)
}
plot(SAmerica)
gIsValid(SAmerica)
shoreline <- gUnaryUnion(SAmerica)
plot(shoreline)
There are visible line segments in the areas where the polygons border each other, and this unioned polygon is not valid in the rgeos package.
I am curious about (1) why this is happening?, and (2) what the best way to fix such problems is. Here, I could definitely find better data for country vector files, but sometimes I might be working with more unique data. How can I clean this up?
The problem is that wrld_simpl is a poor dataset. It should be replaced by a better data, with no topological errors.
A Manifold GIS view of detected errors in wrld_simpl dataset
You can try your approach with a better data and the result will be much better.
countries <- c("Bolivia","Brazil","Chile","Colombia","Ecuador","Guyana",
"Paraguay","Peru","Suriname","Uruguay","Venezuela")
writeOGR(wrld_simpl, dsn = 'd:', layer = 'wrld_simpl', driver = 'ESRI Shapefile' )
# Correct topology (normalize topology with Manifol GIS
# reducing precision to 0.00001
Download topology normalized data from Dropbox
sam_topo <- readOGR(dsn = 'd:/', layer = 'Samerica')
#plot(sam_topo, axes = T)
sam2 <- sam_topo[sam_topo$NAME == "Argentina", ]
for (i in countries) {
x <- sam_topo[sam_topo$NAME == i,]
sam2 <- gUnion(sam2, x)
}
gIsValid(sam2)
shoreline <- gUnaryUnion(sam2)
plot(shoreline)
Here's a simple way to do this using getData(...)
in the raster
package. This avoids the use of for
loops and external software.
library(rgeos)
library(raster)
world <- getData("countries")
countries <- c("Argentina","Bolivia","Brazil","Chile","Colombia","Ecuador","Guyana","Paraguay","Peru","Suriname","Uruguay","Venezuela")
SAmerica <- gUnaryUnion(world[world$ENGLISH %in% countries,])
plot(SAmerica)
gIsValid(SAmerica)
# [1] TRUE
The world database loaded by getData(...)
also has a CONTINENTS
field in the attributes table, in this specific case you could achieve the exact same result using:
SAmerica <- gUnaryUnion(world[world$CONTINENT=="South America",])