Take screenshot of full page with Selenium Python

2020-01-27 01:32发布

After trying out various approaches... I have stumbled upon this page to take full-page screenshot with chromedriver, selenium and python.

The original code is here. (and I copy the code in this posting below)

It uses PIL and it works great! However, there is one issue... which is it captures fixed headers and repeats for the whole page and also misses some parts of the page during page change. sample url to take a screenshot:

http://www.w3schools.com/js/default.asp

How to avoid the repeated headers with this code... Or is there any better option which uses python only... ( i don't know java and do not want to use java).

Please see the screenshot of the current result and sample code below.

full page screenshot with repeated headers

test.py

"""
This script uses a simplified version of the one here:
https://snipt.net/restrada/python-selenium-workaround-for-full-page-screenshot-using-chromedriver-2x/

It contains the *crucial* correction added in the comments by Jason Coutu.
"""

import sys

from selenium import webdriver
import unittest

import util

class Test(unittest.TestCase):
    """ Demonstration: Get Chrome to generate fullscreen screenshot """

    def setUp(self):
        self.driver = webdriver.Chrome()

    def tearDown(self):
        self.driver.quit()

    def test_fullpage_screenshot(self):
        ''' Generate document-height screenshot '''
        #url = "http://effbot.org/imagingbook/introduction.htm"
        url = "http://www.w3schools.com/js/default.asp"
        self.driver.get(url)
        util.fullpage_screenshot(self.driver, "test.png")


if __name__ == "__main__":
    unittest.main(argv=[sys.argv[0]])

util.py

import os
import time

from PIL import Image

def fullpage_screenshot(driver, file):

        print("Starting chrome full page screenshot workaround ...")

        total_width = driver.execute_script("return document.body.offsetWidth")
        total_height = driver.execute_script("return document.body.parentNode.scrollHeight")
        viewport_width = driver.execute_script("return document.body.clientWidth")
        viewport_height = driver.execute_script("return window.innerHeight")
        print("Total: ({0}, {1}), Viewport: ({2},{3})".format(total_width, total_height,viewport_width,viewport_height))
        rectangles = []

        i = 0
        while i < total_height:
            ii = 0
            top_height = i + viewport_height

            if top_height > total_height:
                top_height = total_height

            while ii < total_width:
                top_width = ii + viewport_width

                if top_width > total_width:
                    top_width = total_width

                print("Appending rectangle ({0},{1},{2},{3})".format(ii, i, top_width, top_height))
                rectangles.append((ii, i, top_width,top_height))

                ii = ii + viewport_width

            i = i + viewport_height

        stitched_image = Image.new('RGB', (total_width, total_height))
        previous = None
        part = 0

        for rectangle in rectangles:
            if not previous is None:
                driver.execute_script("window.scrollTo({0}, {1})".format(rectangle[0], rectangle[1]))
                print("Scrolled To ({0},{1})".format(rectangle[0], rectangle[1]))
                time.sleep(0.2)

            file_name = "part_{0}.png".format(part)
            print("Capturing {0} ...".format(file_name))

            driver.get_screenshot_as_file(file_name)
            screenshot = Image.open(file_name)

            if rectangle[1] + viewport_height > total_height:
                offset = (rectangle[0], total_height - viewport_height)
            else:
                offset = (rectangle[0], rectangle[1])

            print("Adding to stitched image with offset ({0}, {1})".format(offset[0],offset[1]))
            stitched_image.paste(screenshot, offset)

            del screenshot
            os.remove(file_name)
            part = part + 1
            previous = rectangle

        stitched_image.save(file)
        print("Finishing chrome full page screenshot workaround...")
        return True

19条回答
Deceive 欺骗
2楼-- · 2020-01-27 01:56

Screenshots are limited to the viewport but you can get around this by capturing the body element, as the webdriver will capture the entire element even if it is larger than the viewport. This will save you having to deal with scrolling and stitching images, however you might see problems with footer position (like in the screenshot below).

Tested on Windows 8 and Mac High Sierra with Chrome Driver.

from selenium import webdriver

url = 'https://stackoverflow.com/'
path = '/path/to/save/in/scrape.png'

driver = webdriver.Chrome()
driver.get(url)
el = driver.find_element_by_tag_name('body')
el.screenshot(path)
driver.quit()

Returns: (full size: https://i.stack.imgur.com/ppDiI.png)

SO_scrape

查看更多
【Aperson】
3楼-- · 2020-01-27 01:59

I changed code for Python 3.6, maybe it will be useful for someone:

from selenium import webdriver
from sys import stdout
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import unittest
#from Login_Page import Login_Page
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from io import BytesIO
from PIL import Image

def testdenovoUIavailable(self):
        binary = FirefoxBinary("C:\\Mozilla Firefox\\firefox.exe") 
        self.driver  = webdriver.Firefox(firefox_binary=binary)
        verbose = 0

        #open page
        self.driver.get("http://yandex.ru")

        #hide fixed header        
        #js_hide_header=' var x = document.getElementsByClassName("topnavbar-wrapper ng-scope")[0];x[\'style\'] = \'display:none\';'
        #self.driver.execute_script(js_hide_header)

        #get total height of page
        js = 'return Math.max( document.body.scrollHeight, document.body.offsetHeight,  document.documentElement.clientHeight,  document.documentElement.scrollHeight,  document.documentElement.offsetHeight);'

        scrollheight = self.driver.execute_script(js)
        if verbose > 0:
            print(scrollheight)

        slices = []
        offset = 0
        offset_arr=[]

        #separate full screen in parts and make printscreens
        while offset < scrollheight:
            if verbose > 0: 
                print(offset)

            #scroll to size of page 
            if (scrollheight-offset)<offset:
                #if part of screen is the last one, we need to scroll just on rest of page
                self.driver.execute_script("window.scrollTo(0, %s);" % (scrollheight-offset))
                offset_arr.append(scrollheight-offset)
            else:
                self.driver.execute_script("window.scrollTo(0, %s);" % offset)
                offset_arr.append(offset)

            #create image (in Python 3.6 use BytesIO)
            img = Image.open(BytesIO(self.driver.get_screenshot_as_png()))


            offset += img.size[1]
            #append new printscreen to array
            slices.append(img)


            if verbose > 0:
                self.driver.get_screenshot_as_file('screen_%s.jpg' % (offset))
                print(scrollheight)

        #create image with 
        screenshot = Image.new('RGB', (slices[0].size[0], scrollheight))
        offset = 0
        offset2= 0
        #now glue all images together
        for img in slices:
            screenshot.paste(img, (0, offset_arr[offset2])) 
            offset += img.size[1]
            offset2+= 1      

        screenshot.save('test.png')
查看更多
We Are One
4楼-- · 2020-01-27 01:59

I have modified jeremie-s' answer so that it only get the url once.

browser = webdriver.Chrome(chrome_options=options)
browser.set_window_size(default_width, default_height)
browser.get(url)
height = browser.execute_script("return document.body.parentNode.scrollHeight")

# 2. get screenshot
browser.set_window_size(default_width, height)
browser.save_screenshot(screenshot_path)

browser.quit()
查看更多
聊天终结者
5楼-- · 2020-01-27 02:00

How it works: set browser height as longest as you can...

#coding=utf-8
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def test_fullpage_screenshot(self):
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--start-maximized')
    driver = webdriver.Chrome(chrome_options=chrome_options)
    driver.get("yoururlxxx")
    time.sleep(2)

    #the element with longest height on page
    ele=driver.find_element("xpath", '//div[@class="react-grid-layout layout"]')
    total_height = ele.size["height"]+1000

    driver.set_window_size(1920, total_height)      #the trick
    time.sleep(2)
    driver.save_screenshot("screenshot1.png")
    driver.quit()

if __name__ == "__main__":
    test_fullpage_screenshot()
查看更多
劳资没心,怎么记你
6楼-- · 2020-01-27 02:01

The key is to turn on the headless mode! No stitching required and no need for loading the page twice.

Full working code:

URL = 'http://www.w3schools.com/js/default.asp'

options = webdriver.ChromeOptions()
options.headless = True

driver = webdriver.Chrome(options=options)
driver.get(URL)

S = lambda X: driver.execute_script('return document.body.parentNode.scroll'+X)
driver.set_window_size(S('Width'),S('Height')) # May need manual adjustment
driver.find_element_by_tag_name('body').screenshot('web_screenshot.png')

driver.quit()

This is practically the same code as posted by @Acumenus with slight improvements.

Summary of my findings

I decided to post this anyway because I did not find an explanation about what is happening when the headless mode is turned off (the browser is displayed) for screenshot taking purposes. As I tested (with Chrome WebDriver), if the headless mode is turned on, the screenshot is saved as desired. However, if the headless mode is turned off, the saved screenshot has approximately the correct width and height, but the outcome varies case-by-case. Usually, the upper part of the page which is visible by the screen is saved, but the rest of the image is just plain white. There was also a case with trying to save this Stack Overflow thread by using the above link; even the upper part was not saved which interestingly now was transparent while the rest still white. The last case I noticed was only once with the given W3Schools link; there where no white parts but the upper part of the page repeated until the end, including the header.

I hope this will help for many of those who for some reason are not getting the expected result as I did not see anyone explicitly explaining about the requirement of headless mode with this simple approach. Only when I discovered the solution to this problem myself, I found a post by @vc2279 mentioning that the window of a headless browser can be set to any size (which seems to be true for the opposite case too). Although, the solution in my post improves upon that that it does not require repeated browser/driver opening or page reloading.

Further suggestions

If for some pages it does not work for you, I suggest trying to add time.sleep(seconds) before getting the size of the page. Another case would be if the page requires scrolling until the bottom to load further content, which can be solved by the scheight method from this post:

scheight = .1
while scheight < 9.9:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight/%s);" % scheight)
    scheight += .01

Also, note that for some pages the content may not be in any of the top-level HTML tags like <html> or <body>, for example, YouTube uses <ytd-app> tag. As a last note, I found one page that "returned" a screenshot still with the horizontal scrollbar, the size of the window needed manual adjustment, i.e., the image width needed to be increased by 18 pixels, like so: S('Width')+18.

查看更多
迷人小祖宗
7楼-- · 2020-01-27 02:02

My first answer on StackOverflow. I'm a newbie. The other answers quoted by the fellow expert coders are awesome & I'm not even in the competition. I'd just like to quote the steps taken from the following link: pypi.org

Refer full-page screenshot section.

open your command prompt and navigate to the directory where Python is installed

cd "enter the directory"

install the module using pip

pip install Selenium-Screenshot

The above module works for python 3. once the module is installed, try the following code by creating a separate file in python IDLE

from Screenshot import Screenshot_Clipping
from selenium import webdriver

ob = Screenshot_Clipping.Screenshot()
driver = webdriver.Chrome()
url = "https://github.com/sam4u3/Selenium_Screenshot/tree/master/test"
driver.get(url)

# the line below makes taking & saving screenshots very easy.

img_url=ob.full_Screenshot(driver, save_path=r'.', image_name='Myimage.png')
print(img_url)
driver.close()

driver.quit()
查看更多
登录 后发表回答