read text file into R from password protected webs

2019-06-10 09:43发布

问题:

I had a working script (Windows 32 bit) that sucessfully read a txt file from a password protected web site onwith read.csv. Below is a snippet of the very simple code:

fname <- "http://www.frontierweather.com/degreedays/StatePopulationWeightedWeatherData_Since2010.txt"
dd2 <- read.csv(fname, sep=",", header=T)

Then I got a new computer (Windows 64 bit) and the read.csv is no loger seems able to get beyond the websites authentication. Instead of reading in the data it reads in a garbled dataframe that seems to be related to the web sites authentication:

> head(dd2)
                   X..DOCTYPE.html.PUBLIC....W3C..DTD.XHTML.1.0.Transitional..EN
1                       http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
2                                      <html xmlns=http://www.w3.org/1999/xhtml>
3                                                                         <head>
4                                                    <title>Please login</title>
5                                            <link rel=stylesheet type=text/css 
6         href=http://www.frontierweather.com/amember/templates/css/reset.css />

Uncessfully, I've tried several things to try to get it to work:

  1. Transfered cookies over from old machine
  2. opened website and when prompted allowed windows to save user name and password
  3. prefixed URL (in fname) with "user:password@"
  4. In interent explorer set website where data is stored as a "trusted site".
  5. Checked that all ackages are the same on the new computer and old computer
  6. Verified that both the old and new machine are running the same version (version 9) of Internet Explorer

Any assistance or direction would be greatly appreciated.

回答1:

I figured out my problem and since I wasted an entire day trying to solve this, I wanted to share my solution so hopefully it won’t cause the same consternation for others as it caused me.

First, as far as I can tell, the problem has absolutely nothing to do with R or switching from a 32 to 64 bit machine. Instead it all seems to stem from a new setting in Internet Explore that was introduced beginning with Internet Explorer 7.

In Internet Explorer’s options there option to “Enable Protected Mode (requires restarting Internet Explorer)” that is turned on (checked) by default. Internet Explorer allows you to change the setting for each of the following security zones: Internet, Local Intra Net, Trusted Sites, and Restricted Sites.

After adding the URL where my data was being sourced to the list of Trusted Sites, I turned off the Enable Protected Mode by unchecked the box. Once this change was made and Internet Explorer was restarted the read.csv (above) worked perfectly.

After doing some further research I found the following:

Protected Mode helps prevents malicious software from exploiting vulnerabilities in Internet Explorer 7, protecting your computer from the most common ways that hackers can gain access to your system. -How To Disable Protected Mode in Internet Explorer 7

Presumably by having the Protected Mode enabled (a default setting that came with my new computer); it was preventing R from accessing the cookies which contained my username and password from being fed to R and/or back to Internet Explorer to retrieve the data.