Scraping sites with javascript screen delay [close

2019-06-24 02:37发布

I'm attempting to scrape a site that has a split second javascript delay.

I'm currently using python for scraping. Whenever I 'get' the page, the javascript delay has not finished and is has not completely loaded the new dom yet.

How would I scrape such a pge?

2条回答
Melony?
2楼-- · 2019-06-24 03:17

A reliable way is to scrape it via a web browser or web browser control, e. g. with the i-Macros scraping commands. It works also via Python/Linux.

You can also code this yourself via the webbrowser control on Windows: http://www.codeproject.com/KB/cs/webbrowser.aspx

查看更多
疯言疯语
3楼-- · 2019-06-24 03:31

You can extend Mozilla to build a web scraper which can leverage the full power of the web browser. After all data have been loaded and the DOM has been built, you can extract needed data from the DOM using XSLT. If the DOM was dynamically changed after initial loading, you can take some approaches to wait for the changes. Visit http://www.gooseeker.com for more information. GooSeeker publish a similiar tool free for everyone. Most of codes are in javascript and readible, from which you can find how it runs.

查看更多
登录 后发表回答