RSelenium: Scraping links on page

2019-09-16 01:59发布

问题:

I'm relatively new to RSelenium. I have successfully managed to log into a site from where I need to pull all web links.

That overview page looks like this:

<a title="Search 'A2A'" href="/search?company=a2a&amp;rf=13">A2A</a>
<a title="Search 'ABB'" href="/search?company=abb&amp;rf=13">ABB</a>
<a title="Search 'Achmea'" href="/search?company=achmea&amp;rf=13">Achmea</a>

etc... this continues for another ~6000 links

I have tried to use the following line to grab all the links, but this has not worked:

remDr$findElement(using="link text", value="href")

I'd be very grateful if someone could show me how to grab all the links, including the company names, such as 'A2A', 'ABB', 'Achmea', etc.

Regards, mr_bungles

回答1:

I suggest you use 'rvest' and 'tidyverse' along with RSelenium.

library(tidyverse)
library(rvest)

url <- 'add your url here'

pg <- read_html(url)

tbl <- tibble(
    text = pg %>% html_nodes('add css selector here') %>% html_text()
    link = pg %>% html_nodes('add css selector here') %>% html_attr('href')
)


标签: rselenium