Downloading multiple files in R with variable leng

2019-02-26 03:59发布

New member here. Trying to download a large number of files from a website in R (but open to suggestions as well, such as wget.)

From this post, I understand I must create a vector with the desired URLs. My initial problem is to write this vector, since I have 27 states and 34 agencies within each state. I must download one file for each agency for all states. Whereas the state codes are always two characters, the agency codes are 2 to 7 characters long. The URLs would look like this:

http://website.gov/xx_yyyyyyy.zip

where xxis the state code and yyyyyyy the agency code, between 2 and 7 characters long. I am lost as to how to build one such loop.

I assume I can then download this url list with the following function:

for(i in 1:length(url)){
download.file(urls, destinations, mode="wb")}

Does that make sense?

(Disclaimer: an earlier version of this post was uploaded earlier but incomplete. My mistake, sorry!)

3条回答
对你真心纯属浪费
2楼-- · 2019-02-26 04:28

This should do the job:

agency <- c("FAA", "DEA", "NTSB")
states <- c("AL", "AK", "AZ", "AR")

URLs <-
paste0("http://website.gov/",
       rep(agency, length(agency)),
       "_",
       rep(states, length(states)),
       ".zip")

Then loop through the URLs vector to pull the zip files. It will be faster if you use an apply function.

查看更多
Viruses.
3楼-- · 2019-02-26 04:42

This will download them in batches and take advantage of the speedier simultaneous downloading capabilities of download.file() if the libcurl option is available on your installation of R:

library(purrr)

states <- state.abb[1:27]
agencies <- c("AID", "AMBC", "AMTRAK", "APHIS", "ATF", "BBG", "DOJ", "DOT",
              "BIA", "BLM", "BOP", "CBFO", "CBP", "CCR", "CEQ", "CFTC", "CIA",
              "CIS", "CMS", "CNS", "CO", "CPSC", "CRIM", "CRT", "CSB", "CSOSA",
              "DA", "DEA", "DHS", "DIA", "DNFSB", "DOC", "DOD", "DOE", "DOI")

walk(states, function(x) {
   map(x, ~sprintf("http://website.gov/%s_%s.zip", ., agencies)) %>% 
    flatten_chr() -> urls
    download.file(urls, basename(urls), method="libcurl")
}) 
查看更多
Summer. ? 凉城
4楼-- · 2019-02-26 04:48

If all your agency codes are the same within each state code you could use the below to create your vector of urls to loop through. (You will also need a vector of destinations the same size).

#Getting all combinations
States <- c("AA","BB")
Agency <- c("ABCDEFG","HIJKLMN")
AllCombinations <- expand.grid(States, Agency)
AllCombinationsVec <- paste0("http://website.gov/" ,AllCombinations$Var1, "_",AllCombinations$Var2,".zip" )

You can then try looping through each file something like this:

#loop method

for(i in seq(AllCombinationsVec)){
  download.file(AllCombinationsVec[i], destinations[i], mode="wb")}

This is also another way of looping through items apply functions will apply a function to every item in a list or vector.

#lapply method

mapply(function(x, y) download.file(x,y, mode="wb"),x = AllCombinationsVec, y = destinations)
查看更多
登录 后发表回答