Using Selenium in Python to save a webpage on Fire

2019-02-14 23:31发布

I am trying to use Selenium in Python to save webpages on MacOS Firefox.

So far, I have managed to click COMMAND + S to pop up the SAVE AS window. However,

I don't know how to:

  1. change the directory of the file,
  2. change the name of the file, and
  3. click the SAVE AS button.

Could someone help?

Below is the code I have use to click COMMAND + S:

ActionChains(browser).key_down(Keys.COMMAND).send_keys("s").key_up(Keys.COMMAND).perform()

Besides, the reason for me to use this method is that I encounter Unicode Encode Error when I :-

  1. write the page_source to a html file and
  2. store scrapped information to a csv file.

Write to a html file:

file_object = open(completeName, "w")
html = browser.page_source
file_object.write(html)
file_object.close() 

Write to a csv file:

csv_file_write.writerow(to_write)

Error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 1: ordinal not in range(128)

5条回答
Lonely孤独者°
2楼-- · 2019-02-14 23:56
with open('page.html', 'w') as f:
    f.write(driver.page_source)
查看更多
贼婆χ
3楼-- · 2019-02-14 23:58

You cannot interact with system dialogs like save file dialog. If you want to save the page html you can do something like this:

page = driver.page_source
file_ = open('page.html', 'w')
file_.write(page)
file_.close()
查看更多
放荡不羁爱自由
4楼-- · 2019-02-15 00:07

This is a complete, working example of the answer RemcoW provided:

You first have to install a webdriver, e.g. pip install selenium chromedriver_installer.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# core modules
import codecs
import os

# 3rd party modules
from selenium import webdriver


def get_browser():
    """Get the browser (a "driver")."""
    # find the path with 'which chromedriver'
    path_to_chromedriver = ('/usr/local/bin/chromedriver')
    browser = webdriver.Chrome(executable_path=path_to_chromedriver)
    return browser


save_path = os.path.expanduser('~')
file_name = 'index.html'
browser = get_browser()

url = "https://martin-thoma.com/"
browser.get(url)

complete_name = os.path.join(save_path, file_name)
file_object = codecs.open(complete_name, "w", "utf-8")
html = browser.page_source
file_object.write(html)
browser.close()
查看更多
一纸荒年 Trace。
5楼-- · 2019-02-15 00:09

What you are trying to achieve is impossible to do with Selenium. The dialog that opens is not something Selenium can interact with.

The closes thing you could do is collect the page_source which gives you the entire HTML of a single page and save this to a file.

import codecs

completeName = os.path.join(save_path, file_name)
file_object = codecs.open(completeName, "w", "utf-8")
html = browser.page_source
file_object.write(html)

If you really need to save the entire website you should look into using a tool like AutoIT. This will make it possible to interact with the save dialog.

查看更多
乱世女痞
6楼-- · 2019-02-15 00:12

you can achieve this with pyautogui library but if you have to save multiple pages in a loop, you can't execute any other tasks on the screen.

import pyautogui
import time 
pyautogui.hotkey('ctrl', 's')
time.sleep(1)   
pyautogui.typewrite("file name")
time.sleep(1)
pyautogui.hotkey('enter')
查看更多
登录 后发表回答