I have this for
loop in an R
script:
url <- "https://example.com"
page <- html_session(url, config(ssl_verifypeer = FALSE))
links <- page %>%
html_nodes("td") %>%
html_nodes("tr") %>%
html_nodes("a") %>%
html_attr("href")
base_names <- page %>%
html_nodes("td") %>%
html_nodes("tr") %>%
html_nodes("a") %>%
html_attr("href") %>%
basename()
for(i in 1:length(links)) {
site <- html_session(URLencode(
paste0("https://example.com", links[i])),
config(ssl_verifypeer = FALSE))
writeBin(site$response$content, base_names[i])
}
This loops through links, & downloads a text file to my working directory. I'm wondering if I can put return
somewhere, so that it returns the document.
Reason being, is that I'm executing my script in NiFi (using ExecuteProcess
), and it's not sending my scraped documents down the line. Instead, it just shows the head of my R script. I would assume you would wrap the for
loop in a fun <- function(x) {}
, but I'm not sure how to integrate the x
into an already working scraper.
I need it to return documents down the flow, and not just this:
Processor config:
Even if you are not familiar with NiFi, it would be a great help on the R part! Thanks
If your intent is to both (1) save the output (with
writeBin
) and (2) return the values (in alist
), then try this:The use of
Map
"zips" together the individual elements. For a base-case, the following are identical:But if you want to use same-index elements from multiple lists, you can do one of
where unrolling
Map
results effectively in:The biggest difference between
lapply
andMap
here is thatlapply
can only accept one vector, whereasMap
accepts one or more (practically unlimited), zipping them together. All of the lists used must be the same length or length 1 (recycled), so it's legitimate to do something likeNote:
Map
-versus-mapply
is similar tolapply
-vs-sapply
. For both, the first always returns alist
object, while the second will return avector
IFF every return value is of the same length/dimension, otherwise it too will return alist
.