Waiting for a website to load completely with WebK

2019-05-20 08:15发布

问题:

Possible Duplicate:
Webkit GTK: Determine when a document is finished loading

I want to fetch a website's HTML contents with WebKitGTK+ to handle the javascript redirections automatically.

I am using the following Python code:

def scanURL(domain, retries=3):
    status = 0
    loading = 0

    browser = webkit.WebView()
    browser.open('http://' + domain)
    while browser.get_load_status() < 2:
        continue

    if browser.get_load_status() == 4:
        if retries > 0:
            return scanURL(domain, retries - 1)
        return 'Failed'

    return 'Success'

The website loads fine, but there are some special websites which are redirecting to a webpage redirecting somewhere else, I've tried to connect the load-finished event to a function, and it's called twice.

Is there a way to know when WebKit has completely loaded a webpage ?

How can I know if WebKit is still executing some JavaScript code ?

回答1:

There is no sure way to programatically accomplish that task for all websites, as there are pages where those redirections are initiated with javascript, often started by setTimeout after n-seconds, and there is no built-in method to scan for such "quirks". However if you are parsing a known group of websites, where you surely know that such redirections will happen, you can create a list of those urls with the required number of seconds after which the redirection will happen. After an initial loadFinished is fired, you can start a QTimer, connect it's signal to a function which will fire loadFinished again, so the next page load will surely start while you're waiting for the result. Wait for the page result until there are no new loadStarted signals fired and no redirection needs to be done again.