Hi have to query a website 10000 times I am looking for a real fast way to do it with R
as a template url:
url <- "http://mutationassessor.org/?cm=var&var=7,55178574,G,A"
my code is:
url <- mydata$mutationassessorurl[1]
rawurl <- readHTMLTable(url)
Mutator <- data.frame(rawurl[[10]])
for(i in 2:27566) {
url <- mydata$mutationassessorurl[i]
rawurl <- readHTMLTable(url)
Mutator <- smartbind(Mutator, data.frame(rawurl[[10]]))
print(i)
}
using microbenchmark
I have 680 milliseconds for query. I was wondering if there is a faster way to do it!
Thanks
One way to speed up http connections is to leave the connection open between requests. The following example shows the difference it makes for httr. The first option is most similar to the default behaviour in RCurl.
Note the difference in the namelookup and connect - if you're sharing a handle you need to do each of these operations only once, which saves quite a bit of time.
There's quite a lot of intra-request variation - on average the last two methods should be very similar.