I have a list of locations that contains a city, state, zip, latitude and longitude for each location.
I separately have a list of economic indicators at the county level. I've played with the zipcode
package, the ggmap
package, and several other free geocoding websites including the US Gazeteer files, but can't seem to find a way to match the two pieces.
Are there currently any packages or other sources that do this?
I ended up using the suggestion from JoshO'Brien
mentioned above and found here.
I took his code and changed state
to county
as shown here:
library(sp)
library(maps)
library(maptools)
# The single argument to this function, pointsDF, is a data.frame in which:
# - column 1 contains the longitude in degrees (negative in the US)
# - column 2 contains the latitude in degrees
latlong2county <- function(pointsDF) {
# Prepare SpatialPolygons object with one SpatialPolygon
# per county
counties <- map('county', fill=TRUE, col="transparent", plot=FALSE)
IDs <- sapply(strsplit(counties$names, ":"), function(x) x[1])
counties_sp <- map2SpatialPolygons(counties, IDs=IDs,
proj4string=CRS("+proj=longlat +datum=wgs84"))
# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF,
proj4string=CRS("+proj=longlat +datum=wgs84"))
# Use 'over' to get _indices_ of the Polygons object containing each point
indices <- over(pointsSP, counties_sp)
# Return the county names of the Polygons object containing each point
countyNames <- sapply(counties_sp@polygons, function(x) x@ID)
countyNames[indices]
}
# Test the function using points in Wisconsin and Oregon.
testPoints <- data.frame(x = c(-90, -120), y = c(44, 44))
latlong2county(testPoints)
[1] "wisconsin,juneau" "oregon,crook" # IT WORKS
Matching Zipcodes to Counties is difficult. (Certain zip codes span more than one county and sometimes more than one state. For example 30165)
I am not aware of any specific R package that can match these up for you.
However, you can get a nice table from the Missouri Census Data Center.
You can use the following for data extraction: http://bit.ly/S63LNU
A sample output might look like:
state,zcta5,ZIPName,County,County2
01,30165,"Rome, GA",Cherokee AL,
01,31905,"Fort Benning, GA",Russell AL,
01,35004,"Moody, AL",St. Clair AL,
01,35005,"Adamsville, AL",Jefferson AL,
01,35006,"Adger, AL",Jefferson AL,Walker AL
...
Note the County2.
metadata explanation can be found here.
county
The county in which the ZCTA is all or mostly contained. Over 90% of ZCTAs fall entirely within a single county.
county2
The "secondary" county for the ZCTA, i.e. the county which has the 2nd largest intersection with it. Over 90% of the time this value will be blank.
See also ANSI County codes
http://www.census.gov/geo/www/ansi/ansi.html
I think the package "noncensus" is helpful.
corresponding is what I use to match zipcode with county
### code for get county based on zipcode
library(noncensus)
data(zip_codes)
data(counties)
state_fips = as.numeric(as.character(counties$state_fips))
county_fips = as.numeric(as.character(counties$county_fips))
counties$fips = state_fips*1000+county_fips
zip_codes$fips = as.numeric(as.character(zip_codes$fips))
# test
temp = subset(zip_codes, zip == "30329")
subset(counties, fips == temp$fips)
A simple option is to use the geocode()
function in ggmap
, with the option output="more"
or output="all
.
This can take flexible input, such as the address or lat/lon, and returns Address, city, county, state, country, postal code, etc, as a list.
require("ggmap")
address <- geocode("Yankee Stadium", output="more")
str(address)
$ lon : num -73.9
$ lat : num 40.8
$ type : Factor w/ 1 level "stadium": 1
$ loctype : Factor w/ 1 level "approximate": 1
$ address : Factor w/ 1 level "yankee stadium, 1 east 161st street, bronx, ny 10451, usa": 1
$ north : num 40.8
$ south : num 40.8
$ east : num -73.9
$ west : num -73.9
$ postal_code : chr "10451"
$ country : chr "united states"
$ administrative_area_level_2: chr "bronx"
$ administrative_area_level_1: chr "ny"
$ locality : chr "new york"
$ street : chr "east 161st street"
$ streetNo : num 1
$ point_of_interest : chr "yankee stadium"
$ query : chr "Yankee Stadium"
Another solution is to use a census shapefile, and the same over()
command from the question. I ran into a problem using the maptools base map: because it uses the WGS84 datum, in North America, points that were within a few miles of the coast were mapped incorrectly and about 5% of my data set did not match up.
try this, using the sp
package and Census TIGERLine shape files
counties <- readShapeSpatial("maps/tl_2013_us_county.shp", proj4string=CRS("+proj=longlat +datum=NAD83"))
# Convert pointsDF to a SpatialPoints object
pointsSP <- SpatialPoints(pointsDF, proj4string=CRS("+proj=longlat +datum=NAD83"))
countynames <- over(pointsSP, counties)
countynames <- countynames$NAMELSAD