Return value of for() loop as if it were a functio

2019-08-22 10:17发布

问题:

I have this for loop in an R script:

url <- "https://example.com"
page <- html_session(url, config(ssl_verifypeer = FALSE))

links <- page %>% 
  html_nodes("td") %>% 
  html_nodes("tr") %>%
  html_nodes("a") %>% 
  html_attr("href")

base_names <- page %>%
  html_nodes("td") %>% 
  html_nodes("tr") %>%
  html_nodes("a") %>% 
  html_attr("href") %>%
  basename()

for(i in 1:length(links)) {

  site <- html_session(URLencode(
    paste0("https://example.com", links[i])),
    config(ssl_verifypeer = FALSE))

  writeBin(site$response$content, base_names[i])
} 

This loops through links, & downloads a text file to my working directory. I'm wondering if I can put return somewhere, so that it returns the document.

Reason being, is that I'm executing my script in NiFi (using ExecuteProcess), and it's not sending my scraped documents down the line. Instead, it just shows the head of my R script. I would assume you would wrap the for loop in a fun <- function(x) {}, but I'm not sure how to integrate the x into an already working scraper.

I need it to return documents down the flow, and not just this:

Processor config:

Even if you are not familiar with NiFi, it would be a great help on the R part! Thanks

回答1:

If your intent is to both (1) save the output (with writeBin) and (2) return the values (in a list), then try this:

out <- Map(function(ln, bn) {
  site <- html_session(URLencode(
    paste0("https://example.com", ln)),
    config(ssl_verifypeer = FALSE))
  writeBin(site$response$content, bn)
  site$response$content
}, links, base_names)

The use of Map "zips" together the individual elements. For a base-case, the following are identical:

Map(myfunc, list1)
lapply(list1, myfunc)

But if you want to use same-index elements from multiple lists, you can do one of

lapply(seq_len(length(list1)), function(i) myfunc(list1[i], list2[i], list3[i]))
Map(myfunc, list1, list2, list3)

where unrolling Map results effectively in:

myfunc(list1[1], list2[1], list3[1])
myfunc(list1[2], list2[2], list3[2])
# ...

The biggest difference between lapply and Map here is that lapply can only accept one vector, whereas Map accepts one or more (practically unlimited), zipping them together. All of the lists used must be the same length or length 1 (recycled), so it's legitimate to do something like

Map(myfunc, list1, list2, "constant string")

Note: Map-versus-mapply is similar to lapply-vs-sapply. For both, the first always returns a list object, while the second will return a vector IFF every return value is of the same length/dimension, otherwise it too will return a list.