I've setup a simple webscraping script in Python w/ Selenium and PhantomJS. I've got about 200 URLs in total to scrape. The script runs fine at first then after about 20-30 URLs (it can be more/less as it seems random when it fails and isn't related to any particular URL) I get the following error in python:
selenium.common.exceptions.WebDriverException: Message: 'Can not connect to GhostDriver'
And my ghostdriver.log:
PhantomJS is launching GhostDriver...
[ERROR - 2014-07-04T17:27:37.519Z] GhostDriver - main.fail - {"message":"Could not start Ghost Driver","line":82,"sourceId":140692115795456,"sourceURL":":/ghostdriver/main.js","stack":"Error: Could not start Ghost Driver\n at :/ghostdriver/main.js:82","stackArray":[{"sourceURL":":/ghostdriver/main.js","line":82}]}
I've searched and most of the questions on SO seem to be that they can't even run a single URL. The only other question I've found where the error occurs at the middle of the script is this one and the answer is to upgrade phantomjs to the latest version, which I've done. The other answer simply says to try that URL again and doesn't seem a good solution since the URL could simply fail again.
I am running phantomjs version 1.9.7 and selenium version 2.42.1 on Linux Mint 17 on python 2.7.6
for url in ['example.com/1/', 'example.com/2/', 'example.com/3/', .. , ..]:
user_agent = 'Chrome'
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap['phantomjs.page.settings.userAgent'] = user_agent
driver = webdriver.PhantomJS(executable_path='/usr/bin/phantomjs', desired_capabilities=dcap)
driver.get(url)