I am looking to automate the process of downloading Census data from all block groups from the US using the tidycensus package. There is instructions from the developer to download all tracts within the US, however, block groups cannot be accessed using the same method.
Here is my current code that does not work
library(tidyverse)
library(tidycensus)
census_api_key("key here")
# create lists of state and county codes
data("fips_codes")
temp <- data.frame(state = as.character(fips_codes$state_code),
county = fips_codes$county_code,
stringsAsFactors = F)
temp <- aggregate(county~state, temp, c)
state <- temp$state
coun <- temp$county
# use map2_df to loop through the files, similar to the "tract" data pull
home <- map2_df(state, coun, function(x,y) {
get_acs(geography = "block group", variables = "B25038_001", #random var
state = x,county = y)
})
The resulting error is
No encoding supplied: defaulting to UTF-8.
Error: parse error: premature EOF
(right here) ------^
A similar approach to convert the counties within each state into a list also does not work
temp <- aggregate(county~state, temp, c)
state <- temp$state
coun <- temp$county
df<- map2_df(state, coun, function(x,y) {
get_acs(geography = "block group", variables = "B25038_001",
state = x,county = y)
})
Error: Result 1 is not a length 1 atomic vector
is returned.
Does anyone have an understanding of how this could be completed? More than likely I am not using functions properly or syntax, and I am also not very good with loops. Any help would be appreciated.
Try this package: totalcensus
at https://github.com/GL-Li/totalcensus. It downloads census data files to your own computer and extracts any data from these files. After set up folders and path, run the code below if you want all block group data in 2015 ACS 5-year survey.
library(totalcensus)
# download the 2015 ACS 5-year survey data, which is about 50 GB.
download_census("acs5year", 2015)
# read block group data of variable B25038_001 from all states plus DC
block_groups <- read_acs5year(
year = 2015,
states = states_DC,
table_contents = "B25038_001",
summary_level = "block group"
)
The extracted data of 217739 block groups of all states and DC:
# GEOID lon lat state population B25038_001 GEOCOMP SUMLEV NAME
# 1: 15000US020130001001 -164.1232 54.80448 AK 982 91 all 150 Block Group 1, Census Tract 1, Aleutians East Borough, Alaska
# 2: 15000US020130001002 -161.1786 55.60224 AK 1116 247 all 150 Block Group 2, Census Tract 1, Aleutians East Borough, Alaska
# 3: 15000US020130001003 -160.0655 55.13399 AK 1206 352 all 150 Block Group 3, Census Tract 1, Aleutians East Borough, Alaska
# 4: 15000US020160001001 178.3388 51.95945 AK 1065 264 all 150 Block Group 1, Census Tract 1, Aleutians West Census Area, Alaska
# 5: 15000US020160002001 -166.8899 53.85881 AK 2038 380 all 150 Block Group 1, Census Tract 2, Aleutians West Census Area, Alaska
# ---
# 217735: 15000US560459511001 -104.7889 43.99520 WY 1392 651 all 150 Block Group 1, Census Tract 9511, Weston County, Wyoming
# 217736: 15000US560459511002 -104.4785 43.76853 WY 2050 742 all 150 Block Group 2, Census Tract 9511, Weston County, Wyoming
# 217737: 15000US560459513001 -104.2575 43.88160 WY 1291 520 all 150 Block Group 1, Census Tract 9513, Weston County, Wyoming
# 217738: 15000US560459513002 -104.1807 43.85406 WY 1046 526 all 150 Block Group 2, Census Tract 9513, Weston County, Wyoming
# 217739: 15000US560459513003 -104.2601 43.84355 WY 1373 547 all 150 Block Group 3, Census Tract 9513, Weston County, Wyoming
The solution was provided by the author of tidycensus
(Kyle Walker), and is as follows:
Unfortunately this just doesn't work at the moment. If it did work,
your code would need to identify the counties within each state within
a function evaluated by map_df
and then stitch together the dataset
county-by-county, and state-by-state. The issue is that block group
data is only available by county, so you'd need to walk through all
3000+ counties in the US in turn. If it did work, a successful call
would look like this:
library(tigris)
library(tidyverse)
library(tidycensus)
library(sf)
ctys <- counties(cb = TRUE)
state_codes <- unique(fips_codes$state_code)[1:51]
bgs <- map_df(state_codes, function(state_code) {
state <- filter(ctys, STATEFP == state_code)
county_codes <- state$COUNTYFP
get_acs(geography = "block group", variables = "B25038_001",
state = state_code, county = county_codes)
})
The issue is that while I have internal logic to allow for multi-state
calls, or multi-county calls within a state, tidycensus can't yet
handle multi-state and multi-county calls simultaneously.