-->

Render JavaScript to HTML in Python?

2019-04-11 14:22发布

问题:

What

My web-app is made dynamic through Google's AngularJS.

I want static versions of my pages to be generated.

Why

Web-scrapers like Google's execute and render the JavaScript; but don't treat the content the same way as their static equivalents.

References:

  • Does heavy JavaScript use adversely impact Googleability? (Programmers StackExchange)
  • Making AJAX Applications Crawlable (Google Documentation for webmasters)

How

Not sure exactly how—which is why I'm asking—but I want to access the same source that your browser's 'inspect element' presents; rather than the source that: Ctrl+U (View page source) shows.

Once I have a script which renders the page; 'spitting' out the HTML+CSS; I will place those 'generated' files on my web-server. A 'cron' job will then be scheduled to regenerate the files at regular intervals.

These static files will subsequently be served instead of the dynamic ones; when JavaScript is disabled and/or when a scraper 'visits' the site.

回答1:

Here is one solution, however I very much doubt I'll be able to find a public PaaS cloud which can run it:

import spynner

if __name__=='__main__':
    url = "http://angular.github.com/angular-phonecat/step-10/app/#/phones"
    browser = spynner.Browser()
    browser.create_webview(True)
    browser.load(url, load_timeout=60)
    print browser._get_html()
    # ^ Can pipe this to a file, POST it to my server or return it as a string
    browser.close()

Package: Spynner (on Github)