# I would like to read the list of .html files to extract data. Appreciate your help.
library(rvest)
library(XML)
library(stringr)
library(data.table)
library(RCurl)
u0 <- "https://www.r-users.com/jobs/"
u1 <- read_html("https://www.r-users.com/jobs/")
download_folder <- ("C:/R/BNB/")
pages <- html_text(html_node(u1, ".results_count"))
Total_Pages <- substr(pages, 4, 7)
TP <- as.numeric(Total_Pages)
# reading first two pages, writing them as separate .html files
for (i in 1:TP) {
url <- paste(u0, "page=/", i, sep = "")
download.file(url, paste(download_folder, i, ".html", sep = ""))
#create html object
html <- html(paste(download_folder, i, ".html", sep = ""))
}
Here is a potential solution:
I could not find the class
.result-count
in the html, so instead I looked for the page-numbers class and pick the highest returned value. Also, the functionhtml
is deprecated thus I replaced it withread_html
. Good luck