How can I scrape data from a website within a fram

2020-04-30 03:18发布

问题:

The following link contains the results of the marathon of Paris: http://www.schneiderelectricparismarathon.com/us/the-race/results/results-marathon. I want to scrape these results, but the information lies within a frame. I know the basics of scraping with Rvest and Rselenium, but I am clueless on how to retrieve the data within such a frame. To get an idea, one of the things I tried was:

url = "http://www.schneiderelectricparismarathon.com/us/the-race/results/results-marathon"
site = read_html(url)
ParisResults = site %>% html_node("iframe") %>% html_table()
ParisResults = as.data.frame(ParisResults)

Any help in solving this problem would be very welcome!

回答1:

The results are loaded by ajax from the following url :

url="http://www.aso.fr/massevents/resultats/ajax.php?v=1460995792&course=mar16&langue=us&version=3&action=search"
  table <- url %>%
    read_html(encoding="UTF-8") %>%
    html_nodes(xpath='//table[@class="footable"]') %>%
    html_table()

PS: I don't know what ajax is exactly, and I just know basics of rvest

EDIT: in order to answer the question in the comment: I don't have a lot of experience in web scraping. If you only use very basic technics with rvest or xml, you have to understand a little more the web site, and every site has its own structure. For this one, here is how I did:

  1. As you see, in the source code you don't see any results because they are in an iframe, and when inspecting the code, you can see after "RESULTS OF 2016 EDITION":

    class="iframe-xdm iframe-resultats" data-href="http://www.aso.fr/massevents/resultats/index.php?langue=us&course=mar16&version=3"

  2. Now you can use directly this url : http://www.aso.fr/massevents/resultats/index.php?langue=us&course=mar16&version=2

  3. But you still can get the results. You can then use Chrome developer tools > Network > XHR. When refreshing the page, you can see that the data is loaded from this url (when you choose the Men category) : http://www.aso.fr/massevents/resultats/ajax.php?course=mar16&langue=us&version=2&action=search&fields%5Bsex%5D=F&limiter=&order=

  4. Now you can get the results !

  5. And if you want the second page, etc. you can click on the number of the page, then use developer tool to see what happens !