I'm relatively new to R programming and I'm trying to put some of the stuff I'm learning in the Johns Hopkins Data Science track to practical use. Specifically, I would like to automate the process of downloading historical bond prices from the US Treasury website
Using both Firefox and R, I was able to determine that the US Treasury website uses a very simple HTML POST form to specify a single date for the quotes of interest. It then returns a table of secondary market information for all outstanding bonds.
I have unsuccessfully tried to use two different R packages to submit a request to the US Treasury web server. Hare are the two approaches I tried:
Attempt #1 (using RCurl):
url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
td.html <- postForm(url,
submit = "Show Prices",
priceDate.year = 2014,
priceDate.month = 12,
priceDate.day = 15,
.opts = curlOptions(ssl.verifypeer = FALSE))
This results in a web page being returned and stored in td.html
but all it contains is an error message from the treasurydirect server. I know the server is working because when I submit the same request via my browser, I get the expected results.
Attempt #2 (using rvest):
s <- html_session(url)
f0 <- html_form(s)
f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
test <- submit_form(s, f1)
Unfortunately, this approach doesn't even leave R and results in the following error message from R:
Submitting with 'submit'
Error in function (type, msg, asError = TRUE) : <url> malformed
I can't seem to figure out how to see what "malformed" text is being sent to rvest so that I can try to diagnose the problem.
Any suggestions or tips to solving this seeming simple task would be greatly appreciated!
I know this is an old question, but adding the
parameter to
postForm
does the trick as well.Well, it appears to work with the
httr
library.The
rvest
library is really just a wrapper tohttr
. It looks like it doesn't do a good job of interpreting absolute URLs without the server name. So if you look atyou see that it just has the path and not the server name. This appears to be confusing
httr
. If you dothat seems to work. Perhaps it's a big that should be reported to
rvest
. (Tested onrvest_0.1.0
)