R - posting a login form using RCurl

2020-02-26 02:36发布

问题:

I am new to using R to post forms and then download data off the web. I have a question that is probably very easy for someone out there to spot what I am doing wrong, so I appreciate your patience. I have a Win7 PC and Firefox 23.x is my typical browser.

I am trying to post the main form that shows up on

http://www.aplia.com/

I have the following R script:

your.username <- 'username'
your.password <- 'password'
setwd( "C:/Users/Desktop/Aplia/data" )

require(SAScii) 
require(RCurl)
require(XML)
agent="Firefox/23.0" 

options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
curl = getCurlHandle()
curlSetOpt(
cookiejar = 'cookies.txt' ,
useragent = agent,
followlocation = TRUE ,
autoreferer = TRUE ,
curl = curl
)

# list parameters to pass to the website (pulled from the source html)
params <-
list(
'userAgent' = agent,
'screenWidth' = "",
'screenHeight' = "",
'flashMajor' = "",
'flashMinor' = "",
'flashBuild' = "",
'flashPatch' = "",
'redirect' = "",
'referrer' = "http://www.aplia.com",
'txtEmail' = your.username,
'txtPassword' = your.password 
    )

# logs into the form
html = postForm('https://courses.aplia.com/', .params = params, curl = curl)
html

# download a file once form is posted
html <-
getURL(
"http://courses.aplia.com/af/servlet/mngstudents?ctx=filename" ,
curl = curl
)
html

But from there I can tell that I am not getting the page I want, as what is returned into html is a redirect message that appears to be asking me to login again (?):

"\r\n\r\n<html>\r\n<head>\r\n    <title>Aplia</title>\r\n\t<script language=\"JavaScript\" type=\"text/javascript\">\r\n\r\n        top.location.href = \"https://courses.aplia.com/af/servlet/login?action=form&redirect=%2Fservlet%2Fmngstudents%3Fctx%3Dfilename\";\r\n    \r\n\t</script>\r\n</head>\r\n<body>\r\n    Click <a href=\"https://courses.aplia.com/af/servlet/login?action=form&redirect=%2Fservlet%2Fmngstudents%3Fctx%3Dfilename\">here</a> to continue.\r\n</body>\r\n</html>\r\n"

Although I do believe there are a series of redirects that occur once the form is posted successfully (manually, in a browser). How can I tell the form was posted correctly?

I am quite sure that once I can get the post working correctly, I won't have a problem directing R to download the files I need (online activity reports for each of my 500 students this semester). But spent several hours working on this and got stuck. Maybe I need to set more options with the RCurl package that have to do with cookies (as the site does use cookies) ---?

Any help so much appreciated!! I typically use R to handle statistical data so am new to these packages and functions.

回答1:

The answer ends up being very simple. For some reason, I didn't see one option that needs to be included in postForm:

html = postForm('https://courses.aplia.com/', .params = params, curl = curl, style="POST")

And that's it...