I developed some codes to scraping traffic data based this topic.I need to scrape many pages after log in, but right now my codes seem repeatedly log in the site for each url. How can I ‘reuse’ the session to avoid repeated log in so that, hopefully, the codes can run faster? Here's the pseudo-code:
generateURL <- function(siteID){return siteURL}
scrapeContent <- function(siteURL, session, filled_form){return content}
mainPageURL <- 'http://pems.dot.ca.gov/'
pgsession <- html_session(mainPageURL)
pgform <- html_form(pgsession)[[1]]
filled_form <- set_value(pgform, 'username'='myUserName', 'password'='myPW')
siteIDList = c(1,2,3)
vectorOfContent <- vector(mode='list', length=3) #to store all the content
i=1
for (siteID in siteIDList){
url = generateURL(siteID)
content = scrapeContent(url, pgsession, filled_form)
vectorOfContent[[i]]=content
i = i +1}
I read the rvest documnentation but there is no such details in it. My question: How can I ‘reuse’ the session to avoid repeated log in? Thanks!