I am trying to download some traffic data from pems.dot.ca.gov, following this topic.
rm(list=ls())
library(rvest)
library(xml2)
library(httr)
url <- "http://pems.dot.ca.gov/?report_form=1&dnode=tmgs&content=tmg_volumes&tab=tmg_vol_ts&export=&tmg_station_id=74250&s_time_id=1369094400&s_time_id_f=05%2F21%2F2013&e_time_id=1371772740&e_time_id_f=06%2F20%2F2013&tod=all&tod_from=0&tod_to=0&dow_5=on&dow_6=on&tmg_sub_id=all&q=obs_flow&gn=hour&html.x=34&html.y=8"
pgsession <- html_session(url)
pgform <-html_form(pgsession)[[1]]
filled_form <- set_values(pgform,
'username' = 'omitted',
'password' = 'omitted')
resp = submit_form(pgsession, filled_form)
resp_2 = resp$response
cont = resp_2$content
I checked the class()
of these items and found that the resp is a 'session', resp_2 is a 'response', and cont is 'raw'. My question is: how can I extract the html content correctly so that I can proceed with XPath to pick out the actual data I want from this page? My intuition is that I should parse the resp_2 which is a response, but I just can not make it work. Your help are highly appreciated!
This should do it:
(you'll obviously need to modify the column names)
You need
httr::content
, which parses a response into content, which in this case is HTML that can easily be parsed withrvest
: