I purchased a VPS — my first ever — which is running CentOS 7 64-bit. I had absolutely zero experience with CentOS 7 until I began using this VPS today, so please go easy on me.

When trying to scrape some dynamically generated content with Scrapy and Selenium, the script ultimately fails and the log throws an error that reads:
DevToolsActivePort file doesn't exist

On the very next line in the log, it pulls up info about the Chrome WebDriver:
(Driver info: chromedriver=2.40.565383 ...

Hence I suspect the issue has nothing to do with locating webdriver.

I've included part of the log below. The execution of the script always begins to hang for an extended period of time before ultimately failing when Selenium is first queried, which is why I haven't included the purely Scrapy part of the log.

The second-to-top answer with 4 votes on this thread reads, "This error message implies that the ChromeDriver was unable to initiate/spawn a new WebBrowser i.e. Chrome Browser session."

I've installed the Chrome Browser according to these instructions from the official repository.

Chrome is installed in the /usr/bin/google-chrome directory, whereas chromedriver is located in the /usr/local/bin/ directory. Both directories have been added to PATH.

I've tried searching section 7.1 Exceptions in this non-official Selenium documentation for anything having to do with this error, but came up empty handed.

When I try to launch Google Chrome on the VPS via SSH, I get an error that reads [83526:83526:0622/212649.156252:ERROR:zygote_host_impl_linux.cc(88)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180. and links to a page that is no longer available... When I try to open Chrome with the --no-sandbox parameter, I then get the error: (google-chrome-stable:85573): Gtk-WARNING **: cannot open display: [0622/221013.556327:ERROR:nacl_helper_linux.cc(310)] NaCl helper process running without a sandbox! Most likely you need to configure your SUID sandbox correctly.

Nothing is wrong with my code, though I'll include it below anyways. My script works fine locally on my own computer.

What's going on? I'm at a loss right now. Any help would be appreciated big time!

I have yet to try messing with option parameters of webdriver.Chrome(...), but I plan on trying this right after I finish posting this question.

Above are just a few of things that I've tried to remedy the situation.

Part of the Log When the Issues Begin

2018-06-22 20:31:22 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:41533/session {"capabilities": {"firstMatch": [{}], "alwaysMatch": {"browserName": "chrome", "platformName": "any", "goog:chromeOptions": {"extensions": [], "args": []}}}, "desiredCapabilities": {"browserName": "chrome", "version": "", "platform": "ANY", "goog:chromeOptions": {"extensions": [], "args": []}}}
2018-06-22 20:32:22 [selenium.webdriver.remote.remote_connection] DEBUG: Finished Request
2018-06-22 20:32:22 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.amazon.ca/b/ref=sr_aj?node=2055586011> (referer: None)
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 102, in iter_errback
    yield next(it)
  File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/offsite.py", line 30, in process_spider_output
    for x in result:
  File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/referer.py", line 339, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/usr/local/lib/python3.6/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/bldsprt/public_html/spiders/selen.py", line 53, in parse
    self.driver = webdriver.Chrome('/usr/local/bin/chromedriver')
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 75, in __init__
    desired_capabilities=desired_capabilities)
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 156, in __init__
    self.start_session(capabilities, browser_profile)
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 245, in start_session
    response = self.execute(Command.NEW_SESSION, parameters)
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 314, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: DevToolsActivePort file doesn't exist
  (Driver info: chromedriver=2.40.565383 (76257d1ab79276b2d53ee976b2c3e3b9f335cde7),platform=Linux 3.10.0-862.3.3.el7.x86_64 x86_64)

2018-06-22 20:32:23 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
2018-06-22 20:32:23 [scrapy.core.engine] INFO: Closing spider (finished)
2018-06-22 20:32:23 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 310,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 111488,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2018, 6, 23, 0, 32, 23, 3101),
 'log_count/DEBUG': 4,
 'log_count/ERROR': 1,
 'log_count/INFO': 8,
 'memusage/max': 54161408,
 'memusage/startup': 46567424,
 'response_received_count': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'spider_exceptions/WebDriverException': 1,
 'start_time': datetime.datetime(2018, 6, 23, 0, 31, 21, 19959)}
2018-06-22 20:32:23 [scrapy.core.engine] INFO: Spider closed (finished)
[root@host spiders]#

Part of the Script

self.driver = (executable_path='../../../../usr/local/bin/chromedriver')
self.driver.get(response.url)
self.driver.set_window_size(960, 540)
self.driver.wait = WebDriverWait(self.driver, 10)
next = self.driver.find_element_by_xpath('//a[@id="pagnNextLink"]')
href = next.get_attribute('href')
self.driver.quit()

标签： python linux google-chrome selenium selenium-chromedriver

0条回答

Selenium/Chrome/ChromeDriver Issue Preventing Craw

Part of the Log When the Issues Begin

Part of the Script

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间