R Import - CSV file from password protected URL -

2019-08-12 12:10发布

Okay - so here is what I'm trying to do.

I've got this password protected CSV file I'm trying to import into R.

I can import it fine using:

read.csv()

and when I run my code in RStudio everything works perfect.

However, when I try and run my .R file using a batch file (windows .bat) it doesn't work. I want to use the .BAT file so that I can set up a scheduled task to run my code every morning.

Here is my .BAT file:

"E:\R-3.0.2\bin\x64\R.exe" CMD BATCH "E:\Control Files\download_data.R" "E:\Control Files\DailyEmail.txt"

And here is my .R file:

url <- "http://username:password@www.url.csv"

data <- read.csv(url, skip=1)

** note, I've put my username/password and the exact location of the CSV in my code. I've used generic stuff here, as this is work related and posting usernames and passwords is probably frowned upon.

As I've said, this code works fine when I use it in RStudio. But fails when I use the .BAT file.

I get the following error message:

Error in download.file(url, "E:/data/data.csv") : cannot open URL 'websiteurl' In addition: Warning message: In download.file(url, "E:/data/data.csv") : unable to resolve 'username' Execution halted

** above websiteurl is the http above (I can't post links) So obviously, the .BAT is having trouble with the username/password? Any thoughts?

* EDIT *

I've gone so far as trying this on Linux. Thinking maybe windows was playing silly bugger.

Just from the terminal, I run Rscript -e "download_data.r" and get the EXACT same error message as I did in Windows. So I suspect this may be a problem with where I'm getting the data? Could the provider be blocking data from the command line, but not from with Rstudio?

2条回答
贼婆χ
2楼-- · 2019-08-12 12:55

I have had similar problems which had to do with file permissions. The .bat file somehow does not have the same privileges as you running the code directly from Rstudio. Try using rscript (http://stat.ethz.ch/R-manual/R-devel/library/utils/html/Rscript.html) within your .bat file like

Rscript "E:\Control Files\download_data.R"

What is the purpose of the argument "E:\Control Files\DailyEmail.txt"? Is the program suppose to use it in any way?

查看更多
Bombasti
3楼-- · 2019-08-12 13:00

So, I've found a solution, which is likely not the most practical for most people, but works for me.

What I did was migrated my project over to a Linux system. Running daily scripts, is easier on Linux anyways.

The solution makes use of the "wget" function in linux.

You can either run the wget right in your shell script, or make use of the system() function in R to run the wget.

code looks like:

wget -O /home/user/.../file.csv --user=userid --password='password' http://www.url.com/file.csv

And you can do something like:

syscomand >- "wget -O /home/.../file.csv --user=userid --password='password' http://www.url.com/file.csv"

system (syscommand)

in R to download the CSV to a location on your hard drive, then grab the CSV using read.csv()

Doing it this way gave me some more insight into the potential root cause of the problem. While the system(syscommand) is running, I get the following output:

Connecting to www.website.com (www.website.com)|ip.ad.re.ss|:80... connected.

HTTP request sent, awaiting response... 401 Unauthorized

Reusing existing connection to www.weburl.com:80.

HTTP request sent, awaiting response... 200 OK

Not sure why it has to send the request twice? And why I'm getting a 401 Unauthorized the first try?

查看更多
登录 后发表回答