RSelenium: hangs in navigate to direct pdf downloa

2019-02-28 06:33发布

问题:

Using RSelenium via Docker Toolbox for Windows with selenium/standalone-firefox-debug container - all working fine: docker run -d -v //c/test/://home/seluser/Downloads -p 4445:4444 -p 5901:5900 selenium/standalone-firefox-debug

Have setup firefox profile to download pdf directly:

fprof <- makeFirefoxProfile(list(browser.startup.homepage = "about:blank"
                                 , startup.homepage_override_url = "about:blank"
                                 , startup.homepage_welcome_url = "about:blank"
                                 , startup.homepage_welcome_url.additional = "about:blank"
                                 , browser.download.dir = "/home/seluser/Downloads"
                                 , browser.download.folderList = 2L
                                 , browser.download.manager.showWhenStarting = FALSE
                                 , browser.download.manager.focusWhenStarting = FALSE
                                 , browser.download.manager.closeWhenDone = TRUE
                                 , browser.helperApps.neverAsk.saveToDisk = "application/pdf, application/octet-stream"
                                 , pdfjs.disabled = TRUE
                                 , plugin.scan.plid.all = FALSE
                                 , plugin.scan.Acrobat = 99L))

Using the following code, when I navigate directly to the pdf, it downloads to the specified directory fine but then it hangs at that point, not allowing any proceeding code to execute.

library(RSelenium)

remDr <- remoteDriver(remoteServerAddr = "*docker-ip*", port = 4445L, extraCapabilities = fprof)
remDr$open()
remDr$navigate("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=A&BorP=P&TID=BEL&CTRY=USA&DT=09/12/2015&DAY=D&STYLE=EQB")

I have to manually stop the R code and the error that displays is:

Error in checkError(res) : 
Undefined error in httr call. httr output: Operation was aborted by an application callback

If I VNC into the container and look at what is displayed in the browser, the file has downloaded but there is nothing in the address bar.

screenshot Any ideas? I am assuming that it is something to do with the httr/rselenium packages not receiving some sort of 'loaded' signal from the browser, but this extends beyond my troubleshooting ability. This method had worked previously using the .jar file selenium-standalone-server and RSelenium.

sessionInfo() & remDr$open() output below:

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RSelenium_1.7.1

loaded via a namespace (and not attached):
 [1] httr_1.2.1     R6_2.2.0       assertthat_0.1 tools_3.3.2    wdman_0.2.2    binman_0.1.0  
 [7] curl_2.3       Rcpp_0.12.9    jsonlite_1.2   caTools_1.17.1 openssl_0.9.6  bitops_1.0-6  
[13] semver_0.2.0   XML_3.98-1.5  



> remDr$open()
[1] "Connecting to remote server"
$rotatable
[1] FALSE

$raisesAccessibilityExceptions
[1] FALSE

$firefoxOptions
$firefoxOptions$args
list()

$firefoxOptions$profile
[1] "UEsDBBQACAgIAEwPW0oAAAAAAAAAAAAAAAAIAAAAcHJlZnMuanOlkU9LAzEQxe+C36HsSaFmwVv1tNBjbwoey2wy242dZsJM0v36JqJYimWV3vLn/R4z72VF2UbB4a7phadyM5pAUo5m5ANG2GGzXDTQc05PPUHYN/fPtzf5BzuXb/mIIt7hNgv9l52QbDlfiRpwzifPAeZcvnd2PAVicMZ5qUhbbVtFqtp2/fWrs/jA5FA2XlNxeZxTHyCUyUviI09vI4aXupMPu8IOQIp/5Qe2Wa8xsMSK1WDNofadJF9iR6SI0sWoJmBputO9UTjiK6+97j/jjpG8hZp/G92wXJw+sE2YHjQJwuE8zSJ+19KAQk/ofh8jUt75YNRCMMXVGSC6sO2ptLPCPdRSVqsq+wBQSwcII+hBcQsBAAD3AgAAUEsDBBQACAgIAEwPW0oAAAAAAAAAAAAAAAAHAAAAdXNlci5qc51WTW/bMAy971cMOW3AKqTretlOXdcBA4Z1aFDsKMgSbauRJU0fcfPvR/mjSRNHbndKbJMS+fj4yOjBUeugfLconGnxiXhWQvdf6oo0TLXMAQHNCgVi8eFtyZSH91/exJ2nYAFtrHEhudTAVKj7Z4JGG8ln/DWE1rg1qUOwxNbS19uz9Nky788U6CrU6Pjx8vK52xiwAybwR0AAHkB8l86HK4yFK0C34OJhuKbBvB4pr51pgHrupA3URU2DbJLLxXL6osAKTxAOfauvlfEwnc1oLUyrlWEC79KsSsDWpv1Tg14hWgmpaXeLQdng02W0MYKpGexhE4xRnoBzxnGjvVH7cB+n72WljUbUGmgKcKvu0edz8eC9RKtgkAsOfETcSgyUcsd8nfdVUq+JsaApPAZwmqlUzFczqExlvYt6+rIWCuHkBp8Z54DljBoz90gHysEFP4nEU6Wkt4ptQdycL1e/DDInlfbTtDG+Erf6j9RYX3++JBIvMvd3P9FjwQoTw+dCMb1... <truncated>


$appBuildId
[1] "20170125094131"

$version
[1] ""

$platform
[1] "LINUX"

$proxy
named list()

$command_id
[1] 1

$nativeEvents
[1] TRUE

$specificationLevel
[1] 0

$acceptSslCerts
[1] FALSE

$processId
[1] 3012

$webdriver.remote.sessionid
[1] "6263b5ab-9375-425e-aa00-8fc632dc492e"

$browserVersion
[1] "51.0.1"

$platformVersion
[1] "4.4.47-boot2docker"

$XULappId
[1] "{ec8030f7-c20a-464f-9b0e-13a3a9e97384}"

$browserName
[1] "firefox"

$takesScreenshot
[1] TRUE

$javascriptEnabled
[1] TRUE

$takesElementScreenshot
[1] TRUE

$platformName
[1] "linux"

$cssSelectorsEnabled
[1] TRUE

$firefox_profile
[1] "UEsDBBQAAgAIAJRZW0oj6EFxCwEAAPcCAAAIAAAAcHJlZnMuanOlkU9LAzEQxe+C36HsSaFmwVv1tNBjbwoey2wy242dZsJM0v36JqJYimWV3vLn/R4z72VF2UbB4a7phadyM5pAUo5m5ANG2GGzXDTQc05PPUHYN/fPtzf5BzuXb/mIIt7hNgv9l52QbDlfiRpwzifPAeZcvnd2PAVicMZ5qUhbbVtFqtp2/fWrs/jA5FA2XlNxeZxTHyCUyUviI09vI4aXupMPu8IOQIp/5Qe2Wa8xsMSK1WDNofadJF9iR6SI0sWoJmBputO9UTjiK6+97j/jjpG8hZp/G92wXJw+sE2YHjQJwuE8zSJ+19KAQk/ofh8jUt75YNRCMMXVGSC6sO2ptLPCPdRSVqsq+wBQSwECHgAUAAIACACUWVtKI+hBcQsBAAD3AgAACAAAAAAAAAABACAAAAAAAAAAcHJlZnMuanNQSwUGAAAAAAEAAQA2AAAAMQEAAAAA"

$id
[1] "6263b5ab-9375-425e-aa00-8fc632dc492e"

回答1:

I had the same problems using the most recent version of firefox (51.0.1). This was on a windows machine and the issue seemed to be the pdfjs.disabled flag. The issue was not present in older versions of firefox. The Docker image tagged 2.53.1 runs firefox 47 for example. If possible run an older version using (on a linux box):

docker run -d -p 4445:4444 -p 5901:5900 -v /home/john/test:/home/seluser/Downloads selenium/standalone-firefox-debug:2.53.1

Now running your code we see:

fprof <- makeFirefoxProfile(list(browser.startup.homepage = "about:blank"
                                 , startup.homepage_override_url = "about:blank"
                                 , startup.homepage_welcome_url = "about:blank"
                                 , startup.homepage_welcome_url.additional = "about:blank"
                                 , browser.download.dir = "/home/seluser/Downloads"
                                 , browser.download.folderList = 2L
                                 , browser.download.manager.showWhenStarting = FALSE
                                 , browser.download.manager.focusWhenStarting = FALSE
                                 , browser.download.manager.closeWhenDone = TRUE
                                 , browser.helperApps.neverAsk.saveToDisk = "application/pdf, application/octet-stream"
                                 , pdfjs.disabled = TRUE
                                 , plugin.scan.plid.all = FALSE
                                 , plugin.scan.Acrobat = 99L))
library(RSelenium)

remDr <- remoteDriver(port = 4445L, extraCapabilities = fprof)
remDr$open()
remDr$navigate("http://www.equibase.com/premium/eqbPDFChartPlus.cfm?RACE=A&BorP=P&TID=BEL&CTRY=USA&DT=09/12/2015&DAY=D&STYLE=EQB")

> list.files("/home/john/test/")
[1] "eqbPDFChartPlus.cfm"

The pdf would need to be renamed (its being named as a colfusion .cfm file)

As to what is happening with more recent versions of firefox you would need to refer that to most likely the geckodriver project. Users with clients other than RSelenium have also had recent issues Can't download PDF with selenium webdriver + firefox



标签: r rselenium