Python, Selenium and Chromedriver - endless loop u

2019-08-26 07:25发布

问题:

Good day to all! I've been experiencing this problem for a week now but I don't think I can solve it and I also do not see any solution based on articles online. Hopefully someone can help me here...

My scenario: I need to monitor prices from 6 different tables in one page that changes almost every second. By end of day, I would close the browser (by pressing the X button) and terminate the script (by pressing Control+C) then run again in the morning and let it run through out the day. The script is written in python and is using selenium to read the prices. The browser I use is Chrome. My OS is Windows 2008 R2; Selenium version is 3.14.1

here is partial part of the code. It is just plainly reading the prices within the tables using find_elements_by_id inside an infinite loop with 1 second interval.

While True:
    close1 = float(browser.find_element_by_id('bnaBox1').find_elements_by_id('lastprc1')[0].text.encode('ascii','ignore'))
    close2 = float(browser.find_element_by_id('bnaBox2').find_elements_by_id('lastprc2')[0].text.encode('ascii','ignore'))
    close3 = float(browser.find_element_by_id('bnaBox3').find_elements_by_id('lastprc3')[0].text.encode('ascii','ignore'))
    close4 = float(browser.find_element_by_id('bnaBox4').find_elements_by_id('lastprc4')[0].text.encode('ascii','ignore'))
    close5 = float(browser.find_element_by_id('bnaBox5').find_elements_by_id('lastprc5')[0].text.encode('ascii','ignore'))
    close6 = float(browser.find_element_by_id('bnaBox6').find_elements_by_id('lastprc6')[0].text.encode('ascii','ignore'))
    time.sleep(1)
...

During the first few minutes of the run, the scripts consumes minimal amount of CPU (approx 20~30 percent) but after few more minutes, consumption slowly shoots up to 100%! There is no other processes running in the machine than the script.

Troubleshooting I've done so far (they all did not solve my issue)

  • upgraded my chrome to latest version - v71 and chromerdriver 2.44
  • rolled back Chrome to previous versions (v62, v68, v69, v70)
  • rolled back Chromedriver version to 2.42 and 2.43
  • cleared my %TEMP% files -
  • rebooted machine (multiple times)

The program only gets values within tables but I suspect that somewhere in the background, as the the script runs, unnecessary data is piling-up which causes the CPU to hit the ceiling.

Hoping that someone can help me figure out what causes this problem in the CPU and resolve the issue.

回答1:

It would be tough to guess the exact reason of 100% CPU Usage without any visibility to your code blocks specifically the WebDriver configuration. So the answer will be pretty much based on generic guidelines as follows:

  • Never close the browser (by pressing the X button). Always invoke driver.quit() within tearDown(){} method to close & destroy the WebDriver and Web Client instances gracefully.
    • You can find a detailed discussion in PhantomJS web driver stays in memory
  • Never terminate the script (by pressing Control+C). Incase there are presence of zombie WebDriver or Web Browser instances you can programatically remove them.
    • You can find a detailed discussion in Selenium : How to stop geckodriver process impacting PC memory, without calling driver.quit()?
  • A couple of useful ChromeOptions() and their usage are as follows:

    options.addArguments("start-maximized"); // open Browser in maximized mode
    options.addArguments("disable-infobars"); // disabling infobars
    options.addArguments("--disable-extensions"); // disabling extensions
    options.addArguments("--disable-gpu"); // applicable to windows os only
    options.addArguments("--disable-dev-shm-usage"); // overcome limited resource problems
    options.addArguments("--no-sandbox"); // Bypass OS security model
    
  • Using hardcoded sleeps in the form of time.sleep(1) is a big No.

    • You can find a detailed discussion in How to sleep webdriver in python for milliseconds
  • Incase you are using Chrome in headless mode, there had been a lot of discussion going around about the unpredictable CPU and Memory Consumption by Chrome Headless sessions.
    • You can find a detailed discussion in Limit chrome headless CPU and memory usage
  • Always keep your Test Environment updated with the latest released binaries as follows:
    • Upgrade ChromeDriver to current ChromeDriver v2.44 level.
    • Keep Chrome version between Chrome v69-71 levels. (as per ChromeDriver v2.44 release notes)
    • Clean your Project Workspace through your IDE and Rebuild your project with required dependencies only.
    • If your base Web Client version is too old, then uninstall it through Revo Uninstaller and install a recent GA and released version of Web Client.
    • Take a System Reboot.
    • Execute your @Test.
  • From Space and Memory Management perspective:
    • (WindowsOS only) Use CCleaner tool to wipe off all the OS chores before and after the execution of your Test Suite.
    • (LinuxOS only) Free Up and Release the Unused/Cached Memory in Ubuntu/Linux Mint before and after the execution of your Test Suite.


回答2:

Have you tried releasing memory into the loop? Maybe by picking up the values (list out of the loop?) and then resetting those variables to None you can avoid excessive memory consumption.

...
while True:

...
    close1 = close2 = close3 = close4 = close5 = close6 = None

...

You can also try forcing the garbage collector:

import gc

while True: 
...
    gc.collect()

If you think that the reason may be a script another another solution to detect the problem might be to enable Chrome to do remote debug and debug the page.

--remote-debugging-port=9222

I hope some of this helps you.