Web crawler capable of interpreting Javascript in

2019-03-14 04:06发布

问题:

My ultimate goal is to build a web crawler capable of downloading all of the images on a webpage. My understanding from the reading I've done is that I need to embed a rendering/layout engine such as Gecko or Webkit.

Unfortunately, I'm running windows, so PyWebkit is out and short learning C++ for Gecko or Java to use Rhino, I'm not sure where to turn.

Is there a reliable rendering engine with python bindings that will work in windows (64-bit, Windows 7)? Is there an easy way to execute javascript within a python script on windows?

回答1:

You don't need Webkit to do that. All you need it an engine to run Javascript code, so take a look at Gogole V8 or Mozilla SpiderMonkey.

If you're prefer Python to build your crawler, you may want to use PyV8 as it provides all necessary bindings.