I am pretty new to R
and am still learning some of the ways to troubleshoot problems I encounter. I'm running into one that I'm stuck on and wondered if anyone has suggestions.
I am trying to build a dot density map, but I'm running into an error with the dotsInPolys
function. The line:
scc.rand <- dotsInPolys(sccpolys, as.integer(plotvar), f="random")
Which gives me the error:
> sccdots.rand <- dotsInPolys(sccpolys, as.integer(plotvar), f="random")
Error in dotsInPolys(sccpolys, as.integer(plotvar), f = "random") :
different lengths
The documentation indicates that sccpolys
and plotvar
need to be the same length, but I'm unsure on how to double-check that, or, more importantly, correct the problem. Does anyone have recommendations on how I can check what's wrong? Thanks ahead of time.
Here's the entire set of code I'm working on:
library(maptools)
# Population data
sccpop <- read.csv("nhgis0010_ds98_1970_tract.csv", stringsAsFactors = FALSE)
sccpop.sub <- sccpop[sccpop$COUNTY=="Santa Clara",c(1,3,21,22,23)]
# Shapefile for Census tracts
scctract.shp <- readShapePoly("1970-ca-tracts.shp")
sccpolys <- SpatialPolygonsDataFrame(scctract.shp, data=as(scctract.shp, "data.frame"))
# Merge datasets
sccdata <- merge(sccpolys@data, sccpop.sub, sort=FALSE)
plotvar <- sccdata$C0X001 / 1000 # one dot per 1,000 people
head(sccpolys@data)
head(sccpop.sub)
# Generate random dots in polygons
sccdots.rand <- dotsInPolys(sccpolys, as.integer(plotvar), f="random")
# County boundaries
baycounties.shp <- readShapePoly("ca-counties-1970.shp")
baycounties <- SpatialPolygonsDataFrame(baycounties.shp, data=as(baycounties.shp, "data.frame"))
par(mar=c(0,0,0,0))
plot(baycounties, lwd=0.1)
# Add dots
plot(sccdots.rand, add=TRUE, pch=19, cex=0.1, col="#00880030")
@LincolnMullen is right. After your merge you have:
> length(sccpolys)
[1] 787
and
> length(plotvar)
[1] 210
To account for this you could replace
sccdots.rand <- dotsInPolys(sccpolys, as.integer(plotvar), f="random")
with
sccdots.rand <- dotsInPolys(sccpolys[sccpolys$GISJOIN %in% sccdata$GISJOIN,], as.integer(plotvar), f="random")
The problem is that you have more tracts (i.e., polygons in your shapefile) than you wish to plot. There are 787
tracts and only 210
tracts in Santa Clara. In addition, there is some manipulation of the @data
slot in the SpatialPolygonsDataFrame
that is unnecessary. Here is a solution that cleans up the merge code.
library(maptools)
shp <- readShapePoly("1970-ca-tracts.shp")
sccpop <- read.csv("nhgis0010_ds98_1970_tract.csv", stringsAsFactors = FALSE)
sccpop.sub <- sccpop[sccpop$COUNTY=="Santa Clara",c(1,3,21,22,23)]
shp <- merge(shp, sccpop.sub)
Now we have a SpatialPolygonsDataFrame
with the data, but there are missing values for all non Santa Clara counties. And we want to transform the population as you did above. The first line below does the transformation, adding a column to the data frame. It's easiest just to keep that inside the data frame instead of as an external variable. The second line filters out all the polygons that don't have population associated with them, i.e., the non-Santa Clara counties.
shp@data$plotvar <- as.integer(shp@data$C0X001 / 1000)
shp <- shp[!is.na(shp@data$plotvar), ]
Now we can proceed as you did before.
sccdots.rand <- dotsInPolys(shp, shp@data$plotvar, f="random")
baycounties.shp <- readShapePoly("ca-counties-1970.shp")
par(mar=c(0,0,0,0))
plot(baycounties.shp, lwd=0.1)
plot(sccdots.rand, add=TRUE, pch=19, cex=0.1, col="#00880030")
FWIW, I have had better results using rgdal::readOGR()
for loading shapefiles, but maptools::readShapePoly()
works fine here.