Building a web crawler - using Webkit packages

2019-04-10 09:05发布

问题:

I'm trying to build a web crawler.
I need 2 things:

  • Convert the HTML into a DOM object.
  • Execute existing JavaScripts on demand.
The result I expect is a DOM Object, where the JavaScript that executes on-load is already executed.
Also, I need an option to execute on demand additional JavaScripts (on events like: onMouseOver, onMouseClick etc.) First of all, I couldn't find a good documentation source.
I searched through Webkit Main Page but couldn't find much information for users of the package, and no usefull code examples. Also, in some forums I've seen instructions not to use the Webkit interface for crawlers, but directly the DOM and Javascript inner packages.

I'm searching for Documentation and Code Examples.
Also, any recommendations on proper usage.

Work environment:
  • OS: Windows
  • Lang: C++

回答1:

Check out some of the testing tools packaged alongside the WebKit trunk. Most ports (as far as I know) include DumpRenderTree which instantiates an WebKitView and then spits out a render tree after processing a specified file. It's theoretically one of the simplest examples of WebKit possible.