HTML page vastly different when using a headless w

2019-04-29 08:46发布

I was under the impression that using a headless browser implementation of webkit using PyQT will automatically get me the html code for each URL even with heavy JS code in it. But I am only seeing it partially. I am comparing with the page I get when I save the page from the firefox window.

I am using the following code -

class JabbaWebkit(QWebPage):
    # 'html' is a class variable

    def __init__(self, url, wait, app, parent=None):
        super(JabbaWebkit, self).__init__(parent)
        JabbaWebkit.html = ''

        if wait:
            QTimer.singleShot(wait * SEC, app.quit)
        else:
            self.loadFinished.connect(app.quit)

        self.mainFrame().load(QUrl(url))

    def save(self):
        JabbaWebkit.html = self.mainFrame().toHtml()

    def userAgentForUrl(self, url):
        return USER_AGENT


    def get_page(url, wait=None):
        # here is the trick how to call it several times
        app = QApplication.instance() # checks if QApplication already exists

        if not app: # create QApplication if it doesnt exist
            app = QApplication(sys.argv)
        #
        form = JabbaWebkit(url, wait, app)
        app.aboutToQuit.connect(form.save)
        app.exec_()
        return JabbaWebkit.html

Can some one see anything obviously wrong with the code?

After running the code through a few URLs, here is one I found that shows the problems I am running into quite clearly - http://www.chilis.com/EN/Pages/menu.aspx

Thanks for any pointers.

标签： python pyqt pyside

1条回答

女痞

2楼-- · 2019-04-29 09:06

The page have ajax code, when it finish load, it still need some time to update the page with ajax. But you code will quit when it finish load.

You should add some code like this to wait some time and process events in webkit:

for i in range(200): #wait 2 seconds
    app.processEvents()
    time.sleep(0.01)

0人赞添加讨论(0) 举报

HTML page vastly different when using a headless w

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间