So I'm trying to login to Quora using Python and then scrape some stuff.
I'm using Selenium to login to the site. Here's my code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
username = driver.find_element_by_name('email')
password = driver.find_element_by_name('password')
username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)
driver.close()
Now the questions:
It took ~4 minutes to find and fill the login form, which painfully slow. Is there something I can do to speed up the process?
When it did login, how do I make sure there were no errors? In other words, how do I check the response code?
How do I save cookies with selenium so I can continue scraping once I login?
If there is no way to make selenium faster, is there any other alternative for logging in? (Quora doesn't have an API)
I had a similar problem with very slow find_elements_xxx calls in Python selenium using the ChromeDriver. I eventually tracked down the trouble to a driver.implicitly_wait() call I made prior to my find_element_xxx() calls; when I took it out, my find_element_xxx() calls ran quickly.
Now, I know those elements were there when I did the find_elements_xxx() calls. So I cannot imagine why the implicit_wait should have affected the speed of those operations, but it did.
You can fasten your form filling by using your own setAttribute method, here is code for java for it
public void setAttribute(By locator, String attribute, String value) {
((JavascriptExecutor) getDriver()).executeScript("arguments[0].setAttribute('" + attribute
+ "',arguments[1]);",
getElement(locator),
value);
}
For Windows 7 and IEDRIVER with Python Selenium, Ending the Windows Command Line and restarting it cured my issue.
I was having trouble with find_element..clicks. They were taking 30 seconds plus a little bit. Here's the type of code I have including capturing how long to run.
timeStamp = time.time()
elem = driver.find_element_by_css_selector(clickDown).click()
print("1 took:",time.time() - timeStamp)
timeStamp = time.time()
elem = driver.find_element_by_id("cSelect32").click()
print("2 took:",time.time() - timeStamp)
That was recording about 31 seconds for each click. After ending the command line and restarting it (which does end any IEDRIVERSERVER.exe processes), it was 1 second per click.
Running the web driver headlessly should improve its execution speed to some degree.
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument('-headless')
browser = webdriver.Firefox(firefox_options=options)
browser.get('https://google.com/')
browser.close()
I have changed locators and this works fast. Also, I have added working with cookies. Check the code below:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys
import pickle
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
wait = WebDriverWait(driver, 5)
username = wait.until(EC.presence_of_element_located((By.XPATH, '//div[@class="login"]//input[@name="email"]')))
password = wait.until(EC.presence_of_element_located((By.XPATH, '//div[@class="login"]//input[@name="password"]')))
username.send_keys('email')
password.send_keys('password')
password.send_keys(Keys.RETURN)
wait.until(EC.presence_of_element_located((By.XPATH, '//span[text()="Add Question"]'))) # checking that user logged in
pickle.dump( driver.get_cookies() , open("cookies.pkl","wb")) # saving cookies
driver.close()
We have saved cookies and now we will apply them in a new browser:
driver = webdriver.Firefox()
driver.get('http://www.quora.com/')
cookies = pickle.load(open("cookies.pkl", "rb"))
for cookie in cookies:
driver.add_cookie(cookie)
driver.get('http://www.quora.com/')
Hope, this will help.