Getting RSelenium Error: “Failed to decode respons

2019-08-17 19:19发布

I'm relatively new to R (and brand spanking new to scraping with R), so apologies in advance if I'm overlooking something obvious here!

I've been trying to learn how to scrape with RSelenium by following this tutorial: https://rawgit.com/petrkeil/Blog/master/2017_08_15_Web_scraping/web_scraping.html#advanced-scraping-with-rselenium

After running the following in Terminal (docker run -d -p 4445:4444 selenium/standalone-firefox), I tried to run the R code below, pulled with only slight modifications from the tutorial hyperlinked above:

get.tree <- function(genus, species) 
{
  # navigate to the page
  browser <- remoteDriver(port=4445L)
  browser$open(silent = T)

  browser$navigate("http://www.bgci.org/global_tree_search.php?sec=globaltreesearch")
  browser$refresh()

  # create r objects from the web search input and button elements

  genusElem <- browser$findElement(using = 'id', value = "genus-field")
  specElem <- browser$findElement(using = 'id', value = "species-field")
  buttonElem <- browser$fiendElement(using = 'class', value = "btn_ohoDO")

  # tell R to fill in the fields

  genusElem$sendKeysToElement(list(genus))
  specElem$sendKeysToElement(list(species))

  # tell R to click the search button

  buttonElem$clickElement()

  # get output

  out <- browser$findElement(using = "css", value = "td.cell_1O3UaG:nth-child(4)") # the country origin
  out <- out$getElementText()[[1]] # extract actual text string
  out <- strsplit(out, split = "; ")[[1]] # turns into character vector

  # close browser

  browser$close()

    return(out)
}

# Now let's try it:

get.tree("Abies", "alba")

But after doing all that, I get the following error:

Selenium message:Failed to decode response from marionette Build info: version: '3.6.0', revision: '6fbf3ec767', time: '2017-09-27T16:15:40.131Z' System info: host: 'd260fa60d69b', ip: '172.17.0.2', os.name: 'Linux', os.arch: 'amd64', os.version: '4.9.49-moby', java.version: '1.8.0_131' Driver info: driver.version: unknown

Error: Summary: UnknownError Detail: An unknown server-side error occurred while processing the command. class: org.openqa.selenium.WebDriverException Further Details: run errorDetails method

Anyone have any idea what this means and where I went wrong?

Thanks very much for your help!

1条回答
冷血范
2楼-- · 2019-08-17 19:51

Just take advantage of the XHR request it makes to retrieve the in-line results and toss RSelenium:

library(httr)
library(tidyverse)

get_tree <-  function(genus, species) {

  GET(
    url = sprintf("https://data.bgci.org/treesearch/genus/%s/species/%s", genus, species), 
    add_headers(
      Origin = "http://www.bgci.org", 
      Referer = "http://www.bgci.org/global_tree_search.php?sec=globaltreesearch"
    )
  ) -> res

  stop_for_status(res)

  matches <- content(res, flatten=TRUE)$results[[1]]

  flatten_df(matches[c("id", "taxon", "family", "author", "source", "problems", "distributionDone", "note", "wcsp")]) %>% 
    mutate(geo = list(map_chr(matches$TSGeolinks, "country"))) %>% 
    mutate(taxas = list(map_chr(matches$TSTaxas, "checkTaxon")))

}

xdf <- get_tree("Abies", "alba")

xdf
## # A tibble: 1 x 8
##      id      taxon   family author     source distributionDone        geo      taxas
##   <int>      <chr>    <chr>  <chr>      <chr>            <chr>     <list>     <list>
## 1 58373 Abies alba Pinaceae  Mill. WCSP Phans              yes <chr [21]> <chr [45]>

glimpse(xdf)
## Observations: 1
## Variables: 8
## $ id               <int> 58373
## $ taxon            <chr> "Abies alba"
## $ family           <chr> "Pinaceae"
## $ author           <chr> "Mill."
## $ source           <chr> "WCSP Phans"
## $ distributionDone <chr> "yes"
## $ geo              <list> [<"Albania", "Andorra", "Austria", "Bulgaria", "Croatia", "Czech Republic", "Fr...
## $ taxas            <list> [<"Abies abies", "Abies alba f. columnaris", "Abies alba f. compacta", "Abies a...

It's highly likely you'll need to modify get_tree() at some point but it's better than having Selenium or Splash or phantomjs or Headless Chrome as a dependency.

查看更多
登录 后发表回答