I want to scrape info from the web and a previous attempt has taught me that docker would have been useful to run my script on since I develop the script on mac os x and then run it on a vm often ubuntu it often won't run since the dependencies don't exist on ubuntu and have proven difficult to build.
Docker overcomes the dependency issue, but this now leads me to a different problem in that I need to develop the script in non-headless mode on the docker image to see what it's doing (or at least I think I do) but on docker I don't think it's possible to run the browser in non-headless mode.
How do others overcome this issue or otherwise get around it?
I'm using python3, selenium on this image that @Harald Norgren helped me build here
This is the sort of script I'm running, but it doesn't really do anything yet, I'm just including it to provide more background in it's helpful.
import csv
import time
from selenium import webdriver
import os
import logging #logging.warning(data_store+file)
import json
project_dir = os.path.dirname(os.path.realpath(__file__))
data_store = project_dir+"/trends-data/"
archive_folder = "archive"
data_archive = data_store + archive_folder + "/"
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("--headless")
prefs = {"download.default_directory" : data_store}
chromeOptions.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(
project_dir+'/chromedriver',
chrome_options=chromeOptions
)
driver.get('https://trends.google.co.uk/trends/explore?q=query');
time.sleep(5)
driver.find_element_by_class_name("ic_googleplus_reshare").click()
time.sleep(5)
driver.find_element_by_class_name("csv-image").click()
time.sleep(5)
driver.quit()