I have a list of points that I want to overlay on a map of San Francisco using ggplot2.
Each point is a longitude, latitude pair.
I want the resulting map to be in a longitude/latitude coordinate system.
I managed to reproduce Hadley Wickham's directions for plotting polygon shapefiles using his example file. I am using R 2.15.1 for Windows.
However, I tried to use cdp files downloaded from the UScensus2010cdp package.
Here's my code snippet:
require("rgdal")
require("maptools")
require("ggplot2")
require("sp")
require("plyr")
gpclibPermit() # required for fortify method
require(UScensus2010)
require(UScensus2010cdp)
data(california.cdp10)
sf <- city(name = "san francisco", state="ca")
sf.points = fortify(sf)
I get the following error:
Using name to define regions.
Error in unionSpatialPolygons(cp, invert(polys)) : input lengths differ
In addition: Warning message:
In split(as.numeric(row.names(attr)), addNA(attr[, region], TRUE)) :
NAs introduced by coercion
Does anybody know:
- What is a good value to give to the region parameter of fortify()?
- If that fails, a source of map data with untransformed lat/long coordinates for San Francisco that ggplot2 can draw?
- Alternatively, I found here another map of San Francisco, whose data is translated. Can you tell me how to either translate this data to raw lat/long or make the reverse translation for my set of points?
note:
- unable to access
UScensus2010cdp
, so am using UScensus2000cpd
which replicates the error.
The issue
The issue arises from the fact that fortify.SpatialPolygonsDataFrame
relies on converting the row.names
to numeric, and the rownames of your data are the identifiers.
ggplot2:::fortify.SpatialPolygonsDataFrame
function (model, data, region = NULL, ...)
{
attr <- as.data.frame(model)
if (is.null(region)) {
region <- names(attr)[1]
message("Using ", region, " to define regions.")
}
polys <- split(as.numeric(row.names(attr)), addNA(attr[,
region], TRUE))
cp <- polygons(model)
try_require(c("gpclib", "maptools"))
unioned <- unionSpatialPolygons(cp, invert(polys))
coords <- fortify(unioned)
coords$order <- 1:nrow(coords)
coords
}
In your case
row.names(sf@data)
## [1] "california_586" "california_590" "california_616"
are the identifiers you wish to use as the region parameters, as place
state
and name
do not uniquely identify the three polygons.
# as.character used to coerce from factor
lapply(lapply(sf@data[,c('place','state','name')], unique), as.character)
## $place
## [1] "67000"
##
## $state
## [1] "06"
##
## $name
## [1] "San Francisco"
As a character vector where the elements begin with alphabetic characters, when coerced to numeric, it becomes NA
as.numeric(rownames(sf@data))
## [1] NA NA NA
## Warning message:
## NAs introduced by coercion
Which is one of the warnings given
Solution
- Define a column to be the rownames
- Set the row.names to
NULL
or 1:nrow(sf@data)
So..
# rownames
sf@data[['place_id']] <- rownames(sf@data)
row.names(sf@data) <- NULL
# fortify
sf_ggplot <- fortify(sf, region = 'place_id')
# merge to add the original data
sf_ggplot_all <- merge(sf_ggplot, sf@data, by.x = 'id', by.y = 'place_id')
# very basic and uninteresting plot
ggplot(sf_ggplot_all,aes(x=long,y=lat, group = group)) +
geom_polygon(aes(fill =pop2000)) +
coord_map()