Apply function that downloads zip files and delete

2019-07-14 01:12发布

问题:

I am trying to write a function and call it using apply to each row in my dataset. The dataset contains URLs of zip files, which will be downloaded, unzipped, and after unzipping TXT and zip files will be deleted from the working directory.

head(data)
                                                 data                                                                   URL
1 /files/market_valuation/ru/2017/val170502170509.zip http://www.kase.kz/files/market_valuation/ru/2017/val170502170509.zip
2 /files/market_valuation/ru/2017/val170424170430.zip http://www.kase.kz/files/market_valuation/ru/2017/val170424170430.zip
3 /files/market_valuation/ru/2017/val170417170423.zip http://www.kase.kz/files/market_valuation/ru/2017/val170417170423.zip
4 /files/market_valuation/ru/2017/val170410170416.zip http://www.kase.kz/files/market_valuation/ru/2017/val170410170416.zip
5 /files/market_valuation/ru/2017/val170403170409.zip http://www.kase.kz/files/market_valuation/ru/2017/val170403170409.zip
6 /files/market_valuation/ru/2017/val170327170402.zip http://www.kase.kz/files/market_valuation/ru/2017/val170327170402.zip

My function:

Price_KASE <- function(data){
    URL = data[,2]
    dir = basename(URL)
    download.file(URL, dir)
    unzip(dir)
    TXT <- list.files(pattern = "*.TXT")
    zip <- list.files(pattern = "*.zip")
    file.remove(TXT, zip)
}

    apply(data, 1, Price_KASE(data))

And the error message:

Error in download.file(URL, dir) : 
  'url' must be a length-one character vector

Please explain what is wrong with my code and how do I fix it? Thank you.

Alternative way using for loop:

for (i in 1:length(data[,2])){
    URL = data[i, 2]
    dir = basename(URL)
    download.file(URL, dir)
    unzip(dir)
    TXT <- list.files(pattern = "*.TXT")
    zip <- list.files(pattern = "*.zip")
    file.remove(TXT, zip)
}

It seems to work OK, but after 4th or 5th file I get In download.file(URL, dir) : cannot open URL 'http://www.kase.kz/files/market_valuation/ru/2017/val170410170416.zip': HTTP status was '503 Service Temporarily Unavailable'

回答1:

I think that in your data frame your URLs are stored as factor variables. try using:

data[,2] <- as.character(data[,2])

if you are reading this as .csv or constructing the data frame, consider setting stringsAsFactors = FALSE.

UPDATE:

I noticed something when you try to use 1 in apply, it takes all of the lines a single vector. So you also have to change your function. Please see bold section below. This code runs completely in the example below giving the output.

data1 <- data.frame(a = "/files/market_valuation/ru/2017/val170502170509.zip",
                b = "http://www.kase.kz/files/market_valuation/ru/2017/val170502170509.zip")


Price_KASE <- function(data){
  **URL = data[2]**
  dir = basename(URL)
  download.file(URL, dir)
  unzip(dir)
  TXT <- list.files(pattern = "*.TXT")
  zip <- list.files(pattern = "*.zip")
  file.remove(TXT, zip)
}

data1$b <- as.character(data1$b)

apply(data1, 1, Price_KASE)

#     [,1]
#[1,] TRUE
#[2,] TRUE


标签: r function apply