可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

The census bureau doesn't provide a nationwide shapefile of public use microdata areas (the smallest geography available on the American Community Survey). I tried combining them all with a few different methods, but even the one that de-dupes identifiers breaks once it hits California. Am I doing something silly or does this require a difficult workaround? Here's code to reproduce up to the point where things break.

library(taRifx.geo)
library(maptools)

td <- tempdir() ; tf <- tempfile()
setInternet2( TRUE )
download.file( "ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/" , tf )

al <- readLines( tf )
tl <- al[ grep( "geo/tiger/TIGER2014/PUMA/tl_2014_" , al ) ]
fp <- gsub( "(.*)geo/tiger/TIGER2014/PUMA/tl_2014_([0-9]*)_puma10\\.zip(.*)" , "\\2" , tl )

# get rid of alaska
fp <- fp[ fp != '02' ]

af <- paste0( "ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/tl_2014_" , fp , "_puma10.zip" )

d <- NULL
for ( i in af ){
    try( file.remove( z ) , silent = TRUE )
    download.file( i , tf , mode = 'wb' )
    z <- unzip( tf , exdir = td )
    b <- readShapePoly( z[ grep( 'shp$' , z ) ] )
    if ( is.null( d ) ) d <- b else d <- taRifx.geo:::rbind.SpatialPolygonsDataFrame( d , b , fix.duplicated.IDs = TRUE )
}

# Error in `row.names<-.data.frame`(`*tmp*`, value = c("d.0", "d.1", "d.2",  : 
  # duplicate 'row.names' are not allowed
# In addition: Warning message:
# non-unique values when setting 'row.names': ‘d.0’, ‘d.1’, ‘d.10’, ‘d.11’, ‘d.12’, ‘d.13’, ‘d.14’, ‘d.15’, ‘d.16’, ‘d.17’, ‘d.18’, ‘d.19’, ‘d.2’, ‘d.3’, ‘d.4’, ‘d.5’, ‘d.6’, ‘d.7’, ‘d.8’, ‘d.9’

回答1:

Here's another approach, which includes a short cut for obtaining the FTP directory listing. As @Pop mentioned, the key is to ensure that the IDs are all unique.

library(RCurl) 
library(rgdal)

# get the directory listing
u <- 'ftp://ftp2.census.gov/geo/tiger/TIGER2014/PUMA/'
f <- paste0(u, strsplit(getURL(u, ftp.use.epsv = FALSE, ftplistonly = TRUE), 
                        '\\s+')[[1]])

# download and extract to tempdir/shps
invisible(sapply(f, function(x) {
  path <- file.path(tempdir(), basename(x))
  download.file(x, destfile=path, mode = 'wb')
  unzip(path, exdir=file.path(tempdir(), 'shps'))
}))

# read in all shps, and prepend shapefile name to IDs
shps <- lapply(sub('\\.zip', '', basename(f)), function(x) {
  shp <- readOGR(file.path(tempdir(), 'shps'), x)
  shp <- spChFIDs(shp, paste0(x, '_', sapply(slot(shp, "polygons"), slot, "ID")))
  shp
})

# rbind to a single object
shp <- do.call(rbind, as.list(shps))

# plot (note: clipping to contiguous states for display purposes)
plot(shp, axes=T, xlim=c(-130, -60), ylim=c(20, 50), las=1)

# write out to wd/USA.shp
writeOGR(shp, '.', 'USA', 'ESRI Shapefile')

回答2:

Your problem as you should have guessed is due to the fact that there are duplicate polygon IDs in your object d.

Indeed, all the polygon IDs in your "shp" files are "0". Thus, you used fix.duplicated.IDs = TRUE to make them different.

This is weird because the taRifx.geo:::rbind.SpatialPolygonsDataFrame should have fixed it as you set fix.duplicated.IDs = TRUE. More accurately, the information is transmitted to sp::rbind.SpatialPolygons which calls the "internal" function sp:::makeUniqueIDs, which finally uses the function base::make.unique.

I did not want to see what went wrong in this chain. Alternatively, I advise you to set up yourself the IDs of your polygons, instead of using the fix.duplicated.IDs option.

To fix it by yourself, replace your for-loop by the following code:

d <- NULL
count <- 0
for ( i in af ){
    try( file.remove( z ) , silent = TRUE )
    download.file( i , tf , mode = 'wb' )
    z <- unzip( tf , exdir = td )
    b <- readShapePoly( z[ grep( 'shp$' , z ) ] )

    for (j in 1:length(b@polygons))
        b@polygons[[j]]@ID <- as.character(j + count)
    count <- count + length(b@polygons)

    if ( is.null( d ) ) 
       d <- b 
    else 
       d <- taRifx.geo:::rbind.SpatialPolygonsDataFrame( d , b )
}

The simple for-loop on j only changes the ID of each polygon in the object b before biding it to d.