Download a file from HTTPS using download.file(

2020-01-25 06:28发布

I would like to read online data to R using download.file() as shown below.

URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(URL, destfile = "./data/data.csv", method="curl")

Someone suggested to me that I add the line setInternet2(TRUE), but it still doesn't work.

The error I get is:

Warning messages:
1: running command 'curl  "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"  -o "./data/data.csv"' had status 127 
2: In download.file(URL, destfile = "./data/data.csv", method = "curl",  :
  download had nonzero exit status

Appreciate your help.

标签: r csv https
9条回答
爷的心禁止访问
2楼-- · 2020-01-25 06:40

It might be easiest to try the RCurl package. Install the package and try the following:

# install.packages("RCurl")
library(RCurl)
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
## Or 
## x <- getURL(URL, ssl.verifypeer = FALSE)
out <- read.csv(textConnection(x))
head(out[1:6])
#   RT SERIALNO DIVISION PUMA REGION ST
# 1  H      186        8  700      4 16
# 2  H      306        8  700      4 16
# 3  H      395        8  100      4 16
# 4  H      506        8  700      4 16
# 5  H      835        8  800      4 16
# 6  H      989        8  700      4 16
dim(out)
# [1] 6496  188

download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv",destfile="reviews.csv",method="libcurl")
查看更多
爷的心禁止访问
3楼-- · 2020-01-25 06:46

If using RCurl you get an SSL error on the GetURL() function then set these options before GetURL(). This will set the CurlSSL settings globally.

The extended code:

install.packages("RCurl")
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))   
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)

Worked for me on Windows 7 64-bit using R3.1.0!

查看更多
ら.Afraid
4楼-- · 2020-01-25 06:46

Had exactly the same problem as UseR (original question), I'm also using windows 7. I tried all proposed solutions and they didn't work.

I resolved the problem doing as follows:

  1. Using RStudio instead of R console.

  2. Actualising the version of R (from 3.1.0 to 3.1.1) so that the library RCurl runs OK on it. (I'm using now R3.1.1 32bit although my system is 64bit).

  3. I typed the URL address as https (secure connection) and with / instead of backslashes \\.

  4. Setting method = "auto".

It works for me now. You should see the message:

Content type 'text/csv; charset=utf-8' length 9294 bytes
opened URL
downloaded 9294 by
查看更多
SAY GOODBYE
5楼-- · 2020-01-25 06:49

Here's an update as of Nov 2014. I find that setting method='curl' did the trick for me (while method='auto', does not).

For example:

# does not work
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip')

# does not work. this appears to be the default anyway
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip', method='auto')

# works!
download.file(url='https://s3.amazonaws.com/tripdata/201307-citibike-tripdata.zip',
              destfile='localfile.zip', method='curl')
查看更多
Evening l夕情丶
6楼-- · 2020-01-25 06:53

127 means command not found

In your case, curl command was not found. Therefore it means, curl was not found.

You need to install/reinstall CURL. That's all. Get latest version for your OS from http://curl.haxx.se/download.html

Close RStudio before installation.

查看更多
虎瘦雄心在
7楼-- · 2020-01-25 06:53

Offering the curl package as an alternative that I found to be reliable when extracting large files from an online database. In a recent project, I had to download 120 files from an online database and found it to half the transfer times and to be much more reliable than download.file.

#install.packages("curl")
library(curl)
#install.packages("RCurl")
library(RCurl)

ptm <- proc.time()
URL <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
x <- getURL(URL)
proc.time() - ptm
ptm

ptm1 <- proc.time()
curl_download(url =URL ,destfile="TEST.CSV",quiet=FALSE, mode="wb")
proc.time() - ptm1
ptm1

ptm2 <- proc.time()
y = download.file(URL, destfile = "./data/data.csv", method="curl")
proc.time() - ptm2
ptm2

In this case, rough timing on your URL showed no consistent difference in transfer times. In my application, using curl_download in a script to select and download 120 files from a website decreased my transfer times from 2000 seconds per file to 1000 seconds and increased the reliability from 50% to 2 failures in 120 files. The script is posted in my answer to a question I asked earlier, see .

查看更多
登录 后发表回答