This question describes my conclusion after researching available options for creating a headless Chrome instance in Python and asks for confirmation or resources that describe a 'better way'.
From what I've seen it seems that the quickest way to get started with a headless instance of Chrome in a Python application is to use CEF (http://code.google.com/p/chromiumembedded/) with CEFPython (http://code.google.com/p/cefpython/). CEFPython seems premature though, so using it would likely mean further customization before I'm able to load a headless Chrome instance that loads web pages (and required files), resolves a completed DOM and then lets me run arbitrary JS against it from Python.
Have I missed any other projects that are more mature or would make this easier for me?
Any reason you haven't considered Selenium with the Chrome Driver?
http://code.google.com/p/selenium/wiki/ChromeDriver
http://code.google.com/p/selenium/wiki/PythonBindings
This question is 5 years old now and at the time it was a big challenge to run a headless chrome using python, but good news is:
Starting from version 59, released in June 2017, Chrome comes with a headless driver, meaning we can use it in a non-graphical server environment and run tests without having pages visually rendered etc which saves a lot of time and memory for testing or scraping. Setting Selenium for that is very easy:
(I assume that you have installed selenium and chrome driver):
from selenium import webdriver
#set a headless browser
options = webdriver.ChromeOptions()
options.add_argument('headless')
browser = webdriver.Chrome(chrome_options=options)
and now your chrome will run headlessly, if you take out options from the last line, it will show you the browser.
While I'm the author of CasperJS, I invite you to check out Ghost.py, a webkit web client written in Python.
While it's heavily inspired by CasperJS, it's not based on PhantomJS — it still uses PyQt bindings and Webkit though.
casperjs is a headless webkit, but it wouldn't give you python bindings that I know of; it seems command-line oriented, but that doesn't mean you couldn't run it from python in such a way that satisfies what you are after. When you run casperjs, you provide a path to the javascript you want to execute; so you would need to emit that from Python.
But all that aside, I bring up casperjs because it seems to satisfy the lightweight, headless requirement very nicely.
I use this to get the driver:
def get_browser(storage_dir, headless=False):
"""
Get the browser (a "driver").
Parameters
----------
storage_dir : str
headless : bool
Results
-------
browser : selenium webdriver object
"""
# find the path with 'which chromedriver'
path_to_chromedriver = '/usr/local/bin/chromedriver'
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
if headless:
chrome_options.add_argument("--headless")
chrome_options.add_experimental_option('prefs', {
"plugins.plugins_list": [{"enabled": False,
"name": "Chrome PDF Viewer"}],
"download": {
"prompt_for_download": False,
"default_directory": storage_dir,
"directory_upgrade": False,
"open_pdf_in_system_reader": False
}
})
browser = webdriver.Chrome(path_to_chromedriver,
chrome_options=chrome_options)
return browser
By switching the headless
parameter you can either watch it or not.