Selenium scraping: changing timezone

2019-07-09 04:39发布

问题:

The website I run my headless (PhantomJS) browser through Selenium has different timezone so I get the wrong dates for many entries. Thus my scraped results show the wrong dates/times (i'm in EST, looks like website default is GMT).

I'm scraping from this website. You can get an idea of how i'm scraping dates through a previous question on SO here. Note however i'm not currently scraping the times of games so i'd prefer not to incorporate this in a solution.

The same question is asked here but I don't know how to test the 'obvious' solution of checking to see what time the website is defaulting to. I suppose one would request a time from the client and add/subtract hours from my current time? Can someone please tell me how to do that and/or if there's a better way.

Edit: what I want is to change the website scraped data from the default (GMT) to my time (EST). This will avoid having to mess with adding hours; the dates will reflect what they are for me.

Here's as far as i've gotten:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
#from selenium.webdriver.support.select import Select

driver = webdriver.PhantomJS(executable_path=r'C:/phantomjs.exe')
driver.get('http://www.oddsportal.com/hockey/usa/nhl/results/')

zoneDropDownID = "timezone-content"

driver.implicitly_wait(5)
zoneDropDownElement = WebDriverWait(driver, 10).until(lambda driver: driver.find_element_by_id(zoneDropDownID))
# Select(zoneDropDownID).select_by_visible_text("Eastern") # strobject has no attribute
test = zoneDropDownID.select_by_visible_text("Eastern").click() # TimeOut exception - not found

driver.close()

But I can't get it to click. Should I be searching for a class instead?

回答1:

Just go to that url:

driver.get('http://www.oddsportal.com/set-timezone/15/')


回答2:

A better idea for testing is, to use chromedriver or something similar. The benefit is, that you can check visually, what your script is doing. Here is a sample code (without errohandling) that does what you want. Please be aware, chromedriver.exe must be in the same location as the script is.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--start-maximized")
chrome_options.add_argument("--lang=en")
chrome = webdriver.Chrome(chrome_options=chrome_options)
wait = WebDriverWait(chrome, 300)

import time

chrome.get("http://www.oddsportal.com/hockey/usa/nhl/results/")

dropdown = wait.until(EC.presence_of_element_located((By.ID,"user-header-timezone-expander")))
dropdown.click()

userHeader = chrome.find_element_by_id('user-header-timezone')
time.sleep(2)
ahref = userHeader.find_elements_by_tag_name('a')

for a in ahref:
    print(a.get_attribute("text"))
    if "Eastern Time" in a.get_attribute('text'):
        a.click()
time.sleep(10)
chrome.close()