I'm doing a little project in R that involves scraping some football data from a website. Here's the link to one of the years of data:
http://www.sports-reference.com/cfb/years/2007-schedule.html.
As you can see, there is a "Date" column with the dates hyperlinked, this hyperlink takes you to the stats from that particular game, which is the data I would like to scrape. Unfortunately, a lot of games take place on the same dates, which means their hyperlinks are the same. So if I scrape the hyperlinks from the table (which I have done) and then do something like:
url = 'http://www.sports-reference.com/cfb/years/2007-schedule.html'
links = character vector with scraped date links
for (i in 1:length(links)) {
stats = html_session(url) %>%
follow_link(link[i]) %>%
html_nodes('whateverthisnodeis') %>%
html_table()
}
it will scrape from the first link corresponding to each date. For example there were 11 games that took place on Aug 30, 2007, but if I put that in the follow_link function, it grabs data from the first game (Boise St. Weber St.) every time. Is there any way I can specify that I want it to move down the table?
I have already found a workaround by finding out the formula for the urls to which the date hyperlinks take you, but it's a pretty convoluted process, so I thought I'd see if anyone knew how to do it this way.
This is a complete example:
you are going over a loop, but setting to the same variable ever time, try this:
}