“Eager” Page Load Strategy workaround for Chromedr

2019-02-26 15:59发布

问题:

I want to speed up the loading time for pages on selenium because I don't need anything more than the HTML (I am trying to scrape all the links using BeautifulSoup). Using PageLoadStrategy.NONE doesn't work to scrape all the links, and Chrome no longer supports PageLoadStrategy.EAGER. Does anyone know of a workaround to get PageLoadStrategy.EAGER in python?

回答1:

ChromeDriver is the standalone server which implements WebDriver's wire protocol for Chromium. Chrome and Chromium are still in the process of implementing and moving to the W3C standard. Currently ChromeDriver is available for Chrome on Android and Chrome on Desktop (Mac, Linux, Windows and ChromeOS).

As per the current WebDriver W3C Editor's Draft The following is the table of page load strategies that links the pageLoadStrategy capability keyword to a page loading strategy state, and shows which document readiness state that corresponds to it:

However, if you observe the current implementation of of ChromeDriver, the Chrome DevTools does takes into account the following document.readyStates:

  • document.readyState == 'complete'
  • document.readyState == 'interactive'

Here is a sample relevant log:

[1517231304.270][DEBUG]: DEVTOOLS COMMAND Runtime.evaluate (id=11) {
   "expression": "var isLoaded = document.readyState == 'complete' ||    document.readyState == 'interactive';if (isLoaded) {  var frame = document.createElement('iframe');  frame.name = 'chromedriver dummy frame'; ..."
}

As per WebDriver Status you will find the list of all WebDriver commands and their current support in ChromeDriver based on what is in the WebDriver Specification. Once the implementation are completed from all aspects PageLoadStrategy.EAGER is bound to be functionally present within Chrome Driver.



回答2:

You only use normal or none as the pageLoadStrategy in chromdriver. So either choose none and handle everything yourself or wait for the page load as it normally happens