I want to create a wrapper around html_node that is capable of reading CSS and XPATH arguments. I want to create a quoted expression that can be supplied to html_node and be evaluated at the spot. I figured out how to create the path argument for css and xpath respectively, but when I supply this expression to html_node it does not work. Why not?
page_parser <- function(dat_list, path = NULL, css = FALSE, attr = "") {
library(rlang)
# make css or path argument for html_nodes
if (css == TRUE) {
path <- expr(`=`(css, !!path))
}else{
path <- expr(`=`(xpath, !!path))
}
# extract attribute value
map(dat_list, possibly(function(x) { html_nodes(x, !!path) %>% html_attr(attr) %>% extract(1)}, NA)) %>%
map(1) %>%
lapply(function(x) ifelse(is_null(x), "", x)) %>%
unlist()
}
read_html("https://www.freitag.de/autoren/lutz-herden/alexis-tsipras-fall-oder-praezedenzfall" %>% parge_parser(path = "//meta[@property='og:title']")
read_html("https://www.freitag.de/autoren/lutz-herden/alexis-tsipras-fall-oder-praezedenzfall" %>% parge_parser(path = ".title", css = TRUE)
The function should spit out the content of behind the css or xpath, no matter whether I specified a CSS or a Xpath.
Best, Moritz
In general,
!!
operator only works in functions that support quasiquoation. Unfortunately,rvest::html_nodes
currently does not. (But since it's part of tidyverse, I wouldn't be surprised if the support is added at a later date.)There are several ways to programmatically provide arguments to a function call, including
do.call()
from base R. However, given that you're usingmap
to traverse your page, I recommend pre-settingcss
orxpath
argument ofhtml_nodes
throughpurrr::partial()
: